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CHIN-FEI  HSU.  Tests  for  Finite  Proper  Mixtures  of  Distributions 
(Under  the  direction  of  Norman  L.  Johnson) 


A number  of  hypothesis  testing  problems  are  investigated,  vhich 
involve  the  ccmmon  element  of  deciding  vdiether  an  observed  sample  can 
be  regarded  as  coming  from  a mixture  of  two  or  more  ccsnponent  distribu- 
tions. These  component  distributions  may  be  completely  or  partially 
specified,  or  only  be  estimated  frcmi  samples.  One  of  Johnson's  sta- 
tistics (1973)  is  studied  for  its  asymptotic  performance  and  this 
method  is  applied  to  derive  a test  statistic  for  mixture  of  three 
synmetrical  components.  Next  Thomas'  statistic  (1969)  is  modified  and 
used  in  testing  mixtures  of  two  continuous  ccanponents.  It  is  then  com- 
pared with  Johnson's  statistics  by  calculating  asymptotic  power  for  the 
case  of  two  normal  conponents  against  a single  alternative.  Then  sta- 
tistics for  testing  three  and  four  continuous  ccmiponents  are  derived. 
Statistics  for  tosting  mixtures  of  more  than  four  conponents  are  also 
discussed  and  an  algorithm  to  derive  than  is  obtained.  Further  it  is 
shown  that  these  kinds  of  statistics  can  be  used  to  test  (i)  whether 
it  is  possible  to  reduce  the  number  of  components  in  a mixture,  and  (ii) 
hypotheses  involving  two  or  more  mixtures  simultaneously.  A method  of 
deriving  test  statistics  based  on  minimizing  a Kolmogorov- Smirnov 
statistic  among  mixtures  of  two  known  components  is  suggested  and  a 
ccsnputational  algorithm  is  constructed.  Properties  of  tests  based  on 
these  statistics  are  studied  using  simulation  procedures  for  some 
special  cases. 
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OiAPTER  I 

INITODUCTION  AND  SIJM^Y 

1.1  Motivation 

Mixtures  of  distribution  functions  arise  frequently  in  practice 
and  very  often  they  present  difficulites  to  researchers.  As  an 
example,  similar  items  frcrni  different  sources  might  be  mixed  together 
at  a distribution  center  before  they  are  shipped  out  to  lots  for  sale. 

Doubts  regarding  the  uniformity  of  quality  of  these  products  having 
been  expressed,  a procedure  to  test  idiether  lots  contain  products  fran 
two  or  more  sources  is  desirable.  For  another  interesting  example, 

! : 

see  Thomas  (1969,  pp.  475). 

In  this  paper  a variety  of  hypothesis  testing  problems  (or  models) 
are  investigated  vdiich  possess  the  common  element  that  they  involve 
questions  \diether  an  observed  sample  can  be  regarded  as  ccmiing  from 
a finite  mixture  of  two  or  more  component  (distributions).  These 
component  distributions  may  be  conpletely  or  partially  specified,  or 
possibly  only  estimated  from  samples. 


1.2  Definitions 

A finite  mixture  is  proper  if  its  mixing  coefficients  are  non- 
nagative  and  sum  up  to  1.  More  specifically,  the  distribution 
is  a proper  mixture  of  the  distributions  Fj^,  ---,  Fj^  if  there  exist 

k k 

w.sO,  i=l,---,k,  Z u).  = 1 such  that  F^.  = E w.F.  , 

i=l  ^ ^ i=l  ^ ^ 


i 
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u^'s  are  not  necessarily  known.  Furthermore,  if  the  mixing  coefficients 
are  positive,  we  called  the  mixture  strictly  proper. 

A finite  mixture  is  identifiable  if  it  can  be  uniqurely  expressed 
as  far  as  (1)  the  cranponent  distributions,  (2)  their  numbers,  and 
(3)  the  mixing  coefficients  are  concerned  (Behboodian,  1976).  A finite 
identifiable  mixture  is  necessarily  proper,  but  not  vice  versa. 

1.3  Grouping  of  Problems 

Table  1.1  is  a list  of  testing  problems  concerning  finite  proper 
mixtures.  For  convenience,  relevant  problems,  which  either  (1)  have 
similar  nature,  or (2)  can  be  treated  by  a similar  method,  are  grouped 
together  into  A - F. 

The  contents  of  table  1.1  are  illustrated  by  the  following: 

i) .  The  conditions  are  that  there  are  random  samples  from  Fp  and 

Fj^  respectively  and  all  the  random  variables  are  mutually 
independent;  F2  is  known;  F^  and  F2  are  continuous. 

The  null  hypothesis  Hp  is  Fp  = wFj^  + (1  - w)F2  Properly. 

ii) .  The  conditions  are  that  there  is  a randan  saii5)le  from  Fp; 

Fj^  and  F2  are  known,  absolutely  continuous  with  density  ,^2’’ 
fp(x,{o)  = (Dfj^Cx)  + (1  -a))f2(x);  for  o)'  > to 

fQ(x,(o')/fp(x,(o)  is  a nondecreasing  function  of  some 
suitably  chosen  function  t(x). 

The  null  hypothesis  Hq  is  to  < tOp  . 

iii) .F2^:  The  conditions  are  that  there  are  random  samples  from  F^p  and 

F^p,  and  each  of  F^2»  ^3  either  known  or  there  is 

a random  sample  for  it;  F^2>  ^1,2  continuous. 
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The  null  hypothesis  Hq  is 

^aO  • Val  * Props'll/  S!i 

''bO  ' “b’^al  * - “b)''b2  PPPI*Ply- 

Problems  in  group  A test  the  same  kind  of  null  hypotheses, 
i.e.  that  of  proper  mixtures.  Problems  in  group  B have  the  ccrnmon 
element  that  the  canponent  distributions  are  symmetric  and  differ  only 
in  location  shift,  scale  change,  or  both.  Though  these  conditions 
seoti  more  restrictive  than  those  in  group  A,  the  condition  of  continuity 
is  not  required.  The  null  hypotheses  of  problems  in  group  C deal  with 
"restricted"  proper  mixtures,  i.  e.  the  mixing  proportion  parameter  w 
lies  only  in  a restricted  (or  proper)  interval  [a,b]  of  [0,1]  other  than 
the  entire  unit  interval.  Problems  in  group  D are  similar  to  those  of 
goodness-of-fit  tests,  except  that  (i)  the  sample (s)  is(are)  frcmi  the 
distribution (s)  (Fq  or  Fj^)  other  than  the  one  (f^)  we  intend  to  test, 
and(ii)  we  are  given  an  additional  condition  that  Fq  is  a proper  mixture 
of  Fj^  and  F2.  Problem  El  is  that  of  reducing  the  number  of  components 
from  a given  finite  proper  mixture.  Finally,  problems  in  group  F test 
two  mixtures  simultaneously. 

1.4  Summary  of  the  Results  in  Qiapter  II  to  V 

In  chapter  II  we  deal  mainly  with  problems  of  location  mixtures. 

First  we  derive  the  third  and  fourth  cumulants  of  a statistic  proposed 
by  Johnson  (1973)  and  find  that  this  third  cumulant  is  a constant 
( with  respect  to  to  and  n ) multiple  of  n''^(l  - 2w)  (Proposition  2.1), 
while  the  fourth  curaulant  is  entirely  independent  of  to  (Proposition  2.2), 
provided  that  Pr{  = (m^^  +m2)/2  } = 0 , where  m^  is  the  mean  of 
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F^,  i = 1,  2,  and  is  a random  variable  having  distribution  Fq. 

Next  assuming  that  F^  is  distributed  as  NCjn^,a^)  for  i = 1,  2,  (where 
N(m^,a^)  means  normal  with  mean  m^  and  variance  o^,)  we  compute  the 

third  (Yj^)  and  the  fourth  (Y2)  standardized  cumulants  of  Johnson's 

statistic  for  various  values  of  A = (m2  - mj^)/a  and  find  that  for  sample 

size  n s 100,  \y^\  < .029  and  IY2I  < .01131.  Furthermore  we  use  a 

4-tenn  Gram-Charlier  series  expansion  to  approximate  the  power  of 
Johnson's  statistic  with  respect  to  a single  normal  alternative  and  find 
that  the  values  of  power  are  very  close  to  those  calculated  by  Johnson. 
In  section  2.3  we  propose  a statistic  which  extends  that  of  Johnson  to 
the  case  of  three  synmetric  ccxnponents,  and  derive  a large  sample  test. 
Next  in  section  2.4  we  derive  an  approximate  formula  for  the  power  of 
a test  for  tow  components  mixture  against  a "proportional"  three 
conponents  alternative. 

In  chapter  III  we  first  describe  a statistic  T^,  proposed  by 

Thomas  (1969),  in  the  form  used  by  Hariton  (1972)  and  modify  it  (to 

and  T^)  to  obtain  large  sample  (nonparametric)  tests  for  problems  A2 

and  A3  respectively.  Then  we  ccmipute  the  approximate  power  of  the  test 
using  T^  in  the  case  of  two  normal  components  mixture  against  a single 
normal  alternative  and  compare  its  performance  with  Johnson's  tests. 

In  sections  3.5  and  3.6  we  extend  the  method  of  deriving  a test 
statistic  for  two  component  mixture  to  obtain  test  statistics  for 
three  and  four  component  mixtures  and  find  that  large  sanple  tests 
can  easily  be  formulated  for  problem  A3- -when  all  component  distri- 
butions are  known  - but  not  for  problems  A1  and  A2 . 
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This  suggests  that  we  might  modify  ’niomas'  •statistic.  By  regarding  a 
homogeneous  population  as  a "one  component  mixture"  and  using  a Mann- 
Whitney  statistic  to  test  whether  the  population  distribution  is  a 
specific  one,  the  testing  problem  can  be  embedded  into  problem  A1 
(or  A2,  or  A3)  with  k = 1.  By  doing  so,  we  find  an  interesting 
algorithm  which  enables  us  to  obtain  a test  statistic  for  a (k  + 1) 
components  mixture  from  that  for  a k components  mixture,  when  k=l,2,3, 
provided  that  each  component  distribution  under  consideration  is  either 
known  or  there  is  a randcan  sample  from  it.  Application  of  this 

r algorithm  to  derive  a test  statistic  for  S-catponent  mixture  from  that  of 

i 

4-component  mixture  appears  in  section  3.7  In  the  rest  of  Chapter  III 
; we  deal  with  problem  El,  vhich  involves  reducing  the  number  of  components 

for  a given  finite  proper  mixture.  IWo  special  cases  are  fully 
investigated  under  the  condition  that  each  component  distribution  is 
I known,  i.  e.  (1)  reduction  from  4 components  to  3 components,  and  (2) 

[ reduction  from  3 components  to  2 components.  Large  sample  tests  for 

I these  two  cases  are  also  obtained. 

^ In  chapter  IV  we  investigate  by  the  Monte  Carlo  method  properties 

j of  a statistic  D,  v^ich  we  propose  for  use  in  problems  involving 

j 

proper  mixtures  of  two  known  (or  specified)  component  distributions,  e.g. 
problems  A3,  B1  and  D2.  First  we  define  this  statistic  formally,  then 
rearrange  it  in  another  form  which  is  easier  to  manipulate  computationally 

m 

(see  4.1.4).  In  section  4.2  we  describe  a computer  algorithm  to 
calculate  from  a given  randcan  sample  the  values  of  both  D and  Q,  an 
estimator  of  the  actual  mixing  proportion  parameter  in  the  mixture.  In 
section  4.3  we  study  three  numerical  examples namely,  when  the  two 


i 

‘ 
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component  distributions  are  respectively: 

(i)  F^~N(-1.5,1)  and  F2  ~ N(1.5,l) 

(ii)  ~ N(-2,l)  and  ~ N(2,l) 

(iii)  F^  ~ E(1.5)  and  F2  ~ N(l,l/16) 

where  "E(a)"  means  "exponential  with  mean  a In  order  to  calculate 
the  empirical  distribution  function  (e.d.f.)  of  D,  it  is  first  expressed 
as  a linear  combination  of  conditional  e.d.f. *s  (see  4.1.6).  Then  200 
samples  of  sizes  10,  20,  40,  each  for  example  (i)  and  10,  20,  each  for 
examples  (ii)  and  (iii)  are  generated  to  calculate  each  individual 
conditional  e.d.f..  Fran  these  examples  we  find  that  the  average  of 
the  values  of  the  estimators  u over  200  generate  samples  in  each 
conditional  e.  d.  f.  are  very  close  to  the  actual  proportions  parameter 
0),  provided  that  w are  not  too  close  to  either  end  of  the  unit  interval 
[0,1].  To  see  how  the  statistic  D performs,  we  also  calculate  for  each 
example  approximate  values  of  the  actual  significance  levels  of  D using 
nominal  significance  levels  a = .05,  .025  respectively  (see  Tables  4.5 
— 4.7). 

Chapter  V contains  short  discussions  of  problans  not  included  in 
the  previous  chapters.  In  section  5.1  we  derive  a test  for  problem  B2, 
using  the  sample  mean  as  test  statistic,  for  the  case  when  both  component 
distributions  are  normal  with  a coninon  variance.  In  section  5.2,  we 
first  extend  a lemma  dealing  with  distributions  with  monotone  likelihood 
ratio  property,  and  then  apply  it  to  mixtures.  Next,  in  section  5.3, 
we  derive  a test  statistic  for  problem  B4,  which  can  be  used  together 
with  a Kolmogorov  - Smirnov  statistic  to  form  a test.  In  section  5.4, 
we  show  that  problems  in  group  C can  be  treated  in  exactly  the  same  way 
as  the  corresponding  problems  in  group  A by  proving  that  the  necessary 

1 

J 
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conditions  in  both  groups,  which  we  use  to  derive  the  test  statistics, 

Eire  equivalent.  In  section  5.5  we  first  obtain  necessary  and 
sufficient  conditions  for  the  function  Fq(x)  - to  be  nondecreasing 

vhen  both  Fq  and  are  distribution  functions,  and  then  propose  test 
statistics  for  problems  Dl,  D2  and  D3  (without  deriving  their  distri- 
bution). Finjilly  in  sections  5.6  to  5.8,  we  investigate  the  problems 
of  testing  two  mixtures  siimiltaneously  and  derive  large  sair?)le  test 
statistics  for  such  problems. 


Table  1.1  Grouping  of  Testing  Problans 


(Notations : * 

** 

S 

K 

LS 

SC 

AC 


: Fq  = a)Fj^+(l-a))F2  properly 

; Fq  = a)Fj^+(l-a))F2  for  some  w,  0<asa)<b<l,  a and  b known 

: random  sairple 
: known 

: location  shift 
: scale  change 

: absolutely  continuous  w.r.t.  Lebesgue  measure.) 


I 

t 


Symbol 

*^0 

F, 

^2 

Additional 

»0 

-*■ 

conditions 

Al^ 

S^ 

S^ 

S^ 

continuity® 

* 

A2 

s 

s 

K 

continuity 

* 

A3 

s 

K 

K 

continuity 

* 

B1 

s 

K 

K,LS'‘ 

synmetry® 

* 

B2 

s 

K 

K,LS 

*,  synmetry 

01=0,  or  1 

B3 

s 

K,AC 

K,AC 

*,  (5.2.5) 

* ^ 

B4 

s 

K 

LS® 

- 

B5 

s 

K 

SC® 

- 

A 

Cl 

s 

S 

S 

continuity 

AA 

C2 

s 

S 

K 

continuity 

AA 

C3 

s 

K 

K 

continuity 

AA 

D1 

s 

S 

- 

* 

F2=F,  known 

D2 

s 

K 

- 

* 

F2=F,  known 

D3 

X 

S 

- 

* 

F2=F,  known 

Fi,  i 

L=l,---,k 

k 

m 

El 

s 

S or 

K 

F»=  E a).F.  properly, 

i=l  1 1 

properly 

2an<k 

F 

aO 

^bO 

F..,  i=a,b; 
3=1.2 

FI 

F2 

S 

s 

S or  K 

S or  K 

continuity 

’’lo' 

properly,  i=a,b 

S 

s 

continuity, 

"ar^bl 

same  as  in  FI 

F3 



S 

s 

S or  K 

continuity, 
^al'^bl'  ^a2"^b2 

same  as  in  FI 

Table  1.1  (Cent.) 


1.  Problems  in  groups  A--D  can  be  extended  to  mixtures  of  more 
than  two  components,  and  they  will  be  denoted  by  adding  a 'prime'  to 
the  corresponding  symbols,  for  examples,  Al',  B2',  C3'  etc. 

2.  In  each  problem,  random  variables  among  samples  are  assumed 
to  be  mutually  independent. 

3.  Continuity  or  syninetry  are  eissumed  to  hold  for  every  compo- 
nent distributions. 

4.  This  means  that  F2Cx)=Fj^(x-t)  with  t known. 

5.  This  means  that  F2(x)=Fj^(x-t)  with  t unknown. 

6.  This  means  that  F2  (x)=Fj^  (x/t)  with  t unknown. 


CHAPTER  II 

LOCATION  MIXTURES  WITH  SYM'ffiTRICAL  OOMPaJENTS 

In  this  chapter  we  investigate  the  properties  of  a statistic 
proposed  by  Johnson  (1973)  to  test  mixture  of  two  symmetrical  components. 
Then  we  extend  the  method  to  derive  a test  for  mixture  of  three  synme- 
tric  ccmponents.  Also  we  study  the  performance  of  these  tests  with 
respect  to  certain  alternatives. 


^ i 


r I 


f 

r 
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2.1  Problem  B1  — the  Statistic  fi  * w and  Its  Third  and  Fourth 
* Jl. 

Cumulants 

Let  Fq,  Fj^  and  ¥2  be  distribution  functions.  Assume  that,  for 
i = 1,  2,  F^  are  of  known  forms,  synmetric  about  its  mean  m^  with 
common  variance  o^.  Suppose  that  — , X^  is  a randcsn  sample  from 
Fq.  We  like  to  test  the  following: 

HpC  Fq  = toFj  + (1  ' (d)F2  properly. 

Define 


if  X^<  (m^^  + m2)/2 
otherwise 


for  i = 1,  2, 
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" ^"*1  ■ = f^l  ■ ■ ^2^* 

where  P^  = Pr{X^  < (nij^  + m^/2  | F^}  . (2,1.1) 

Johnson  (1973)  proposes  the  following  statistic  to  test  H^: 

01  - 0)  . 

X y 

Under  H„  both  w and  Q>  are  unbiased  estimators  of  oi. 

0 X y 

Assuming  that  Pr{Xj^  = (m^^  + m2)/2}  = 0 , 


he  shows  that  under  Hq  , 
nVar(fi^  - oi^)  = nK^ 

= (m^  - mp'V  + (P^  - P2)’^PiP2  ' "‘2)'^(Pi-  P2)’\CE^-  » 

where  = E(Xj^  | X^^  < (m^^  + m2)/2,  F^),  i=l,2,  (2.1.2) 

Note  that  n<2  does  not  depend  on  oi.  By  the  central  limit  theorem  a 
large  sample  test  can  be  formulated  as  follows: 

reject  if  [oi^  - o)y|<2  " > , (2.1.3) 

where  is  such  that  ~ standard 

normal  distribution  function. 

In  order  to  see  more  clearly  how  this  large  sample  test  performs 

we  calculate  the  third  and  fourth  cumulants  of  oi  -oi  in  the  following 

X y 

twD  propositions. 

Proposition  2.1 


With  and  Si^  defined  as  above,  we  have  under  Hq, 

n^Kj  = n2E((2^  - 


= (l-2ui)/3(m^  - in2)'2(P^-P2)'^[  (x-m^)  2dF^(x)  - P^cr^ 

V • —00  % 

+ 3P^(E^-  m^)(m^-  m2)'^(P^-  P2)‘^  - P1P2CP1-  P2)'^} 


m^^2 


(2.1.4) 
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where  is  defined  by  (2.1.1)  and  by  (2.1.2). 
Proof:  See  section  2.5. 

Proposition  2.2 

With  a ^ and  (by  defined  as  above,  we  have  under 
n^^  = n^E(u)^-  (3y)^-  3n^[E(u)^-(3y)2]2 

= (nij^-  m2)’^[[7x-  nij^)^dF,  (x)-  So**] 

J -or 


V *"2 


~1  (x-  nij^)  dFj^(x)-  3a^Pj^(Ej^-  nij^)] 


-4(Tn^-  P^y'^l 

|nii+  1^2 

-6(m^-  ni2)‘2(P^-  P^’^I  2~  (x-  m^)^dF^(x)  - P^o^] 

J -00 

-12(m^-  m2)'^(P^-  Pz^'^P^CE^-  + (Pi'  ^^2^ 

-4(mj-  m2)'^(Pj-P2)^l-  6PjP2)P^(Ej-  m^)  , 

vdiere  P^  is  defined  by  (2.1.1)  and  Ej^  by  (2.1.2). 

Proof:  See  section  2.5. 

Z 

Fran  these  propositions  we  find  that  under  the  values  of  n 

3 

IS  proportional  to  (1-  2w),  idiile  n does  not  depend  on  w at  all. 

In  all,  among  the  first  four  cumulants  of  w - u , only  the  third 

X y 

cunulant  depends  on  w and  this  dependence  is  proportional  to  a linear 
function  of  w. 

Define  the  shape  factors 


3/2  , 


^1  ~ 

/ 2 
^2  ~ *^4'*^2 

Then  under  H^,  only  will  depend  on  w . 

In  the  following  we  will  compute  values  of  Yj^  and  Y2  for  various 
values  of  A,  which  is  defined  as 

A = (m2-  mj^)  a ^ , for  the  case  that  F^  is  distributed  as 
N(m^,a*),  i=l,2. 
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ii 


Then  by  definition  we  have 
= $(A/2)  , 

= -oz(A/2)  , 

, V">2 

2 (X-  mp^dF^Cx)  - zi^a^  , 

, -oo 

J_^  2 (X-  in^)^dF^(x)  - 3a2p^(E^-  m^)  = 2(^)0^  - , 

[ (x-  in^^'^dF^Cx)  - 30-  = 0 , 

• “00  2 

where  z(x)  = (211)  ^^^e  ^ . 

It  follows  after  sane  simplification  that 
nK2  = a’^  - 2a'^  (Pj-  P2)'^z(A/2)  + (P^-  P2'^'\^2  ’ 

n\a-  20))'^  = I A'-^(P^-  P2)'^zCf)  - (Pi-  P2^’^V2 

n^K^  = -a’\p^-P2)‘^2-  16P^P2)z(A/2)  - 12a"^(P^-  P2)'^z^(A/2) 

+ 4A'^(P^-  P2)'^z(A/2)  + (P^-  P2)‘\P2(1-  6P^P2)  . 

Note  that  as  x ^ , 

[1-  $(x)]z  ^(x)  = X ^ + o(x  ^) , and 

V 

X z(x)  = 0 , for  any  k . 

Hence  as  A approaches  <»  , 

»^T(1-  2a))'^Yi  ^ [|a'^z(A/2)  - 2A‘^z(A/2)]  [a'^-  2a'^z(A/2) 

+ 2A'^z(A/2)]'^'^^  ■“  -iA^z(A/2)  -»■  0 . 

Ch  the  other  hand  we  have 


by  L'HSpital's  rule.  It  then  follows  that  as  A -►  0 
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Since  the  function  (of  A),  v5i(l-2a))  is  not  identical  to  zero,  there 
exists  at  least  one  extreme  in  the  range  (O,*).  Table  2.1  shows  that 
a minimum  value  of  about  -.28961  for  .^Tyj^Cl-  2(d) occurs  when  A 
belongs  to  the  interval  (2.71,  2.73). 


Table  2.1 

A 

.0 

.01 

.1 

1 

1.5 

2.0 

2.5 
2.7 

2.71 

2.72 

2.73 
3 

5 

10 

Similarly  as  a approaches  “» 
nY2  - [4Az(a/2)  - 2a^z^CA/2)  + 2^ 
and  as  A approaches  0, 
nY2  ~ [4-12+  47r-  ir^/2]  [1-  2+  7r/2]‘' 


1^1(1-  2(d)  ^Yj^ 

.0 

-.001642 

-.016410 

-.15681 

-.22022 

-.26510 

-.28735 

-.28959 

-.28961 

-.28961 

-.28961 

-.28604 

-.12905 

-.00006 


(A/2)][l-  2z(A/2)a'^+  2z(A/2)a'^]’^-^  0 
= -1.13082 
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Table  2.2 


A 

nY2 

0 

-1.13082 

.01 

-1.13081 

.1 

-1.12938 

.5 

-1.09608 

1 

-1.00679 

5 

-.16626 

10 

-.000047 

15 

0 

From  tables  2.1  and  2.2,  for  n > 100 

IYj^I  < .029|1-  2a)|  ^ .029  , and  IY2I  < .01131  . 

For  n > 25, 

Iy^I  < .058 11-  2a)|  5 .058  and  IY2I  < .045233  . 

Remember  that  the  third  and  the  foinrth  cumulants  of  normal  distributions 

are  both  zero.  It  appears  that  in  the  case  of  two  normal  components, 

for  n > 100,  the  normal  distribution  is  a good  approximation  to  that 

-1/2 

of  the  test  statistic  (0^  - Wy.)K2  . Even  for  n as  small  as  25,  we 
can  still  expect  normal  to  be  a good  approximation. 

2.2  Two  Normal  Components  vs.  a Single  Normal  Alternative 

In  order  to  assess  the  properties  of  the  test  procedure  (2.1.3), 
it  is  desirable  to  study  its  performance  when  the  distribution  Fq  is 
not  a mixture  of  two  ccmiponents.  In  this  section  we  will  study  the 
following  null  and  alternative  hypotheses: 

Hq:  Eg  “ ‘^l  properly, 

where  F^  is  distributed  as  NCm^^,  o^),  i=l,2. 


Hj:  Fp  is  distributed  as  NCu.t^)  . 


(2.2.1) 


Suppose  that  X^,  — , is  a random  sample  from  F^.  We  will  derive 

the  curaulants  of  ay  under  iq)  to  fourth  order,  and  then  use  a 

4- term  Gram-Charlier  expansion  to  confute  the  approximate  power  of  the 

test  (2.1.3)  with  respect  to  the  above  alternative  H^.  Using  the  same 

notation  as  in  section  2.1,  under 

E((u^|H^)  = (m^-  m2)’^(y-  m2) 

E((y|H^)  = (P^-  P2)‘^P-  Pi)  . 

where  P = <I>(K)  and  K = (^l'^2  -y)!*^  , 

~T~ 

nV^  = nVar((S^-fiy| 

= (m^-  +(Pl-  P2)‘^PC1-  P)  + 2(m^-  m2)'\p^-  P2)'^z(K) 

(2.2.2) 

n^3=  iy  -E((2^iH^)-E((3ylHp|H^]^ 

=3(m^-  m2)'^(Pi-  P2)‘ VKz(K)-3(m^-  m2)'^(P^-  P2)'^a-  2P)tz(K) 
-(Pi-  P2)‘^P(1-P)(1'2P)  (2.2.3) 

iy-E((S^|H^)  -E((yl%)|Hi]'^-3v2} 

=4(m^-  ni2)‘^(P^-  P2)’ Vz(K)  (K^-1)  -6(m^-m2)^(Pl-P2^ 

[(l-2P)Kz(K)  + 2z^(K)]  + 4(m^-  m2)'^(P^-P2)‘^(l-6P+6P^)Tz(K) 

+ (P1-P2) ''^P(l-P)  (1-6P+6P^)  (2.2.4) 

For  a 5%  asymptotic  significance  level,  the  critical  region  for 
rejecting  Hq  is  {|flj^-(y|  > 1.96/^}  , where  <2  is  given  by  (2.1.5). 

So  the  asynptotic  power  of  the  test  is 

Pr{IV'^y|>1.96/y|H^}=  ‘E(cSxlHi)+E(a^  Ijy)  j) 

+1-F5(V2"^/^[1.96/^  -E((i^lH^)+E((Sy|H^)])  (2.2.5) 

where  C = V2'^/^[(2i^-(y-E(fiy |fy)  + E(fiylHj)]  , 
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and  F^(x)  =$(x)-z(x)[iv3(x^-l)  + i^^Cx^-x)]  is  the  4-tenn  Gram- 
Charlier  expansion  of  the  cumulative  distribution  of  Table  2.3 
contains  values  of  power  for  various  T,y  and 


Table  2.3  ^proximate  power  of  the  test  (2.1.3)  for  two  normal 
components  against  a single  normal  alternative 
(a=5%,  a =1,  m2=-mj^) 


M 

T= 

5 

1.0 

2. 

0 

5. 

0 

"•i 

n=  100 

400 

100  400 

100 

400 

100 

400 

raj^=l 

.2 

.799 

.999 

.0786  .130 

.265 

.501 

.693 

.753 

.4 

.999 

1.0 

.109  .262 

.505 

.915 

.753 

.895 

.6 

1.0 

1.0 

.116  .305 

.758 

.997 

.827 

.975 

.8 

1.0 

1.0 

.0825  .180 

.918 

1.0 

.895 

.997 

mj^=2 

.2 

1.0 

1.0 

.674  .989 

.201 

.347 

.685 

.853 

.4 

1.0 

1.0 

.962  1.0 

.385 

.789 

.853 

.993 

.6 

1.0 

1.0 

.981  1.0 

.679 

.989 

.958 

1.0 

.8 

1.0 

1.0 

.778  1.0 

.920 

1.0 

.993 

1.0 

Comparing  Table  2.3  with  Johnson's  Table  Ib  (1973),  which  is 
calculated  by  normal  approximation,  we  find  that  the  values  in  the 
two  tables  are  very  close  to  each  other. 

For  the  reverse  situation  in  which 
HqI  Fq  is  distributed  as  N(y,a^), 


Fq=  a)Fj^+(l-w)F2  properly, 

where  F^  is  distributed  as  N(y^,a^)  i=l,2,  see  Bryant  (1973,  pp.  28-37), 
who  uses  a different  approach. 
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2.3  Problan  Bl*  — Three  Components 

Let  Xj^,  X2,  — , be  a random  sanple  from  Fq  and  F^(x)  be  of 
knovm  forms,  symmetric  about  its  mean  m^  with  ccmmon  variance  o^, 
for  i=l,2,3.  Also  assume  that  raj^<  m2. 

^0’  C1"Wj^"W2)F3(x)  properly.  (2.3.1) 

Define  randcmi  variables  and  for  i=l,2,---,n,  as  follows: 


^ =1  1 if  Xi  < a 

^ 0 otherwise 


; : 


if  Xi  > b 


0 otherwise 


vhere  a<b  are  two  constants  to  be  defined  later. 

in  , n n 

i^t  I = i y X.  , Yi=  i y Y, . , Y = - y Y.,. 

^^1=1  ^ ^ ^i=l  2 ^i-1  • 


IMder 


vdiere 


^f“l^l^‘^Pl2^^^’‘^‘“2^Pl3 
^2  ''‘^^2l'^‘^*’22'^^^‘‘^l‘‘^2^^23 

a 

Pli=  Pr{X^<  a|F.}  , and 


P2i=  Pr{X.>  b|F.}  , i=l,2,3. 

Eliminating  coj^  and  CO2  in  (2.3.2),  we  obtain 
E[X  -(i5j^mj^-fi2>*i2"  (I”(ji^“<jij2)ni2]  =0 

vdiere 


% 


_ ^^1-Pi3^^P22-P23^-^7-P23)CPi2-Pi3) 
^ll"^13^  ^^22'^23^' ^2l‘^23^  ^^12'^13^ 


and 


(2.3.2) 


(2.3.3) 


(2.3.4) 


Q = ^^2l'^23^~  ^2'^23^ 

*^^12'^13^  ^^2l'^23^  ■ ^^22‘^23^  ^^ll'^13^ 
Note  that  Efij^=a)j^  and  El^2  ~^2’ 

Tj^  = X -i0j^mj^-a)2ni2-  (l-Oj^“(32)ni2  , 
then  E(T^|Hq)=0. 

Proposition  2.3 

Under  Hq  and  the  above  conditions 


nVar  “ o + Lq  + * “2^2 


where 


^0  * ^2*^3  ^23^^'^23^'^^1^2^3  ^13^23 

+ 20^0^  Pi3CE3^3‘J’i3)  + 2D2D2  P23CE23-m2) 

4 = -DiD2D;'(PirPl3^P21-P23)  ^ DiD3't:2P,,  (E^.-m,) 


(2.3.5) 


(2.3.6) 


(2.3.7) 


+ D2DJ  [2P2j^(E2j^-m^)-2P23(E23-m2)+(iiij^-mj)  (P2j^+P23-1)] 

^2  " ‘^1^2^3  ^^22'^23'^^12'^13^  °1°3  '-^^12  ^^12 '"'2^ 

-2Pi3(Ei3-m3)  + (1112-1112)  (Pj^2'^Pl3“l)] 

+ D2D2  [2P22(E22-m2)-2P23(E23-m2)  + (ni2-m2)(Pj^2'^Pl3’^^] 

(2.3.8) 

Dl  " f*"! ’"*3^  *-^22 '^23^  ' •^"*2''"3^  ^*’2l‘^23^ 

^3  ^^12"^13^  *-^2l'^23^  ‘ ^^22‘^23^  ^^ll'^13^ 

E^.  = E(XjX^<a,  Pp 

E2i  = E(XjXj>b,  Pp,  i=l,2,3,  (2.3.9) 
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and  defined  by  (2.3.3), 

Proof:  See  section  2.6. 


Proposition  2.3  shows  that  nVar  is  a linear  combination  of 

2 

o +Lq,  Lj^  and  L2-  It  was  hoped  that  by  assigning  appropriate  values  to 
a and  b,  both  Lj^  and  L2  would  vanish.  Uhfortunately  such  values  could 
not  be  found  handily.  In  the  following  we  let 
a =(mj^-6nj)/2  and  b =(m2*Tri^')/2 


and  show  how  the  expression  of  nVar  can  be  simplified  partially.  It 
follows  from  these  specific  values  of  a and  b that 


^11^^13=1 


’^22‘^^23"^- 


Pll(Eii-mi)  P^jCEj^j-m^) 

^22  ^^22'"‘2^^’^23  ^^23’’"3^  ’ 
and  and  L2  are  simplified  to  the  following 

4 " ■444^^11'43'^4i‘43^‘*‘44^^^4i^4i'"’i^'^43^43’'"3^ 
4 " ■444^^42‘43*42’43^'^44^f^42^42'™2^'^43^43'"'3) 


Next  we  will  derive  a large  sanqile  test  for  Hq.  Note  first  that  (3^^ 

and  ^2  are  both  linear  functions  of  ''ii»^12’'"’^ln’4l’42’  "’*4n’ 
and  are  both  means  of  i.i.d.  random  variables.  Therefore  can 

be  e>q)ressed  as  a mean  of  i.i.d.  randcmi  variables.  Then  by  the  central 
limit  theorem  (Var  Tj^<  »),  under 

Tj^CVar  Tj^)  ■*  N(0,1)  in  distribution. 


But  Var  depends  on  the  unknown  parameters  and  ui2-  Using  in 
(2.3.4)  and  ^2  in  (2.3.5)  as  estimators  of  i^2  respectively,  we  obtain 
an  estimator  of  Var 

nVar  (2.3.11) 

From  Var  < «>  and  Var  ¥2^^  < <»  , it  follows  that  ->  a.s,  and 
->■  a.s.,  hence  Var  -»■  Var  a.s.  In  the  following  we  state  two 

theorems  vrtiich  will  be  used  to  derive  large  sample  results. 

Theorem  2.1  (Slutsky,  see  e.g.  Cramer,  1946,  pp.  255) 

If  X^,  Y^,  — ,Z^  are  randcmi  variables  converging  in  probability  to 

the  constants  x,y, — ,z  respectively,  and  rational  function  R()^,Y^, — , 

Zj^)  converges  in  probability  to  the  constant  R(x,y, — ,z),  provided  that 

v 

the  latter  is  finite.  It  follows  that  any  power  R with 

V 

k>0  converges  in  probability  to  R (x,y, — ,z). 

Theoron  2.2  (See  Cramer,  1946,  pp.  254) 

Let  — be  a sequence  of  random  variables,  with  the  distri- 

bution functions  F^,  F2, — . Suppose  that  Fjj(x)  tends  to  a distribution 
function  F(x)  as  n tends  to  Let  Y^^,  Y2, — be  another  sequence  of 
random  variables,  and  suppose  that  Y^  converges  in  probability  to  a 

constant  c.  Then  the  distribution  function  of  X +Y  tends  to  F(x-c). 

n n 

Further,  if  oO,  the  distribution  function  of  X Y tends  to  F(x/c), 

n n V w y , 

vdiile  that  of  X /Y  tends  to  F(cx). 
n n 

By  Theorems  2.1  and  2.2  we  have 
Tj^(Var  Tj^)  ->■  N(0,1)  in  distribution. 

Therefore  a large  sample  test  can  be  formulated  as 
Reject  Hq  if  |T^|(Var  T^)'^''^  > z 


l-a/2  • 


(2.3.12) 
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2.4  Two  Ccanponents  vs.  Three  Components 

Let  be  a random  sample  from  and  suppose  that  and  F2 

are  each  of  known  form,  symmetric  about  its  mean  m^^,  with  common  variance 
o^.  We  wish  to  test  the  following  hypotheses: 

Hq  : Fq  = a)Fj^  + (l-a))F2  properly  (2.4.1) 

: Fq  = (D(l-a)')Fj+(l-u))(l-w’)F2+u’Fj  properly,  (2.4.2) 

2 

v4iere  w'  is  known,  (0<u)'<l),  and  F^  has  mean  m^,  variance  a . 

"1/2 

Following  the  airgument  in  section  2.1,  we  may  use 

1/2 

as  test  statistic  and  reject  Hq  if  l^x'^l^^l-oi/2*'^2 
Iftider 

E((3^-(a^iHi)  = w’[(mj-m2)‘^(m3-m2)-(P^-P2)'^(P3-P2)^  (2.4.3) 

where  P^  = Pr{Xj^<mj^-hn2y2  [F^},  i=l,2,3.  (2.4.4) 

nVar(fiij^-Cy|Hp  = (mj-m2)'^o^  + (Pi-P2r^P2P2 

- 2 (m^ -m2  ) ■ ^ (P^ -P2  ) ' ^ [ (1 -w  ’ ) Pi  (Ej^ -mp +03 ' P3  (E3 -m3)  ] 

+0)' (1 -0)' ) [ (ra^-m2) (m3-mp  - (P^-p2) ■ ^ (P3-P2)  ] 

(2.4.5) 

vdiere  Ei=E(Xj^ |Xi<(mj^+m2)/2,  F^),  i=l,2,3.  Note  that  riVar (G^-olylFlj^) 
and  E(a)^-fily.|Hj^)  depend  only  on  o)' , but  not  on  oj.  And  if  a)'=0,  then 
nVat(ffl^-fiy|Hj)  = nVar((S^-(3^|HQ)  (the  latter  is  given  by  (2.1.3)). 

The  asymptotic  power  of  the  test  then  can  be  calculated  by  varying  F3 
and  0)  in  the  following  formula: 

Pr{|V^Vl>^l-a/2-<2'''^  l«l> 

=$([Var(<S^-fiy|H^)]'^/2f.^^_^^^-l/2  . E(fi^-(Iiy|Hi)]) 

+$([(Var(ffi^-fiy(H^)I  * E((S^-<Iy|H^)])  (2.4.6) 
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2.5  Proofs  of  Propositions  2.1  and  2.2 


{Xj^,X2,  — ,X^}  is  a random  sample  from  F^.  and  F^  are  under 
the  same  conditions  as  in  section  2.1.  For  convenience  suppose  tluxt  \ 
and  Xj^  have  the  same  distribution  F^,  and  Y and  have  the  same  dist- 
tribution.  By  definition 

ffl^=(X-m2)(m^-in2)‘^  and  y (Y-P2)  (P^-P2)‘^  . 

Let  U=X-EX  and  V=Y-EY.  Then  we  have 
EX  = + (l-(o)ra2 

EY  = uP^  + (l-a))P2 
EU  = EV  = 0 

EU^  = Var  X = + lo(l-u))  (in^-m2)^ 

EV^  = Var  Y = P^P2  + a)(l-a.)  (P^-P2)^ 

EUV  = Cov(X,Y)  = P^(Ej^-nij^)  + a)(l-a))  (nL|^-m2)  (P^-P2) 

EU^  = 10(1-0))  (l-2a))(m^-m2)^ 

EU^  = (l-2o))[-/®^(x-m^)^dF^(x)+P^a^]+(o(l-o))(l-2o))(in^-m2)^(P^-P2) 
vdiere  e=(nij^+m2V2, 

EUV^  = Cl-2o))CP^-P2)[Pi(E^-m^)+o)Cl-o))(m^-m2)(Pj-P2)] 

EV^  = [P^P2+o)(l-u))(Pj-P2)^](l-2o))(P^-P2) 

El/  = /^^(x-m^)‘^dFj^(x)+6o)(l-(o)  (iiij^-m2)^o^+o)(l-(o)  (l-3o)+3o)^)  (mj^-m2)^ 

w\  = /!„(x-m^)^dF^(x)+3o)(l-o,)(m^-m2)[2/®^(x-m^)^dF^(x)-o^] 

+3o)(l-o))  (nLj^-m2)^Pj^(Ej^-mj^)+o)(l-u))  (l-3o)+3<o^)  (mj^-m2)^(Pj^'p2) 

EuV  = -(l-2o.)2(P^-P2)/!„(x-m^)2dF^(x)  + (l-2o.)^Pj-P2)PjO^ 

+ o)Cl-a))(l-3o)+3a)^)(m^-m2)^(P^-P2)^+[P^P2+'^(l-w)(P^-P2)^]o^ 

2 

+(o(l-o)) (iiij^-m2)  P2^2 
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EUV^  = [PlP2+a-3‘-+3u.2)(Pl-P2)^][Pl(Ei-m^)+u.(l-(.)(m^-m2)(P^-P2)] 

EV^  = Pi^P2^  + a-2u>+2u>^)P^?2(JP^-P^)^*u>a-ui)a-3w*5w^nP^-P^)^ 

Proof  of  Proposition  2.1: 

n^E(oJ^-fi)y)^  = (inj^-in2)'W  -3(nij^-m2)‘^CPj^-P2)’^EU^ 

+3Cm^-m2Ap^-P2)'W^  - (P^-P2)'W 

= (1-2  a,)  { 3 Cin^-in2)  ‘^Pi-P2)  (x)  -P^a^] 

+3P^  (E^-in^)  (m^-m2)  ■ V1-P2)  ■^-PlP2  (P1-P2)  ■^} 

Proof  of  Proposition  2.2: 

= (ni^-in2)‘^[EU^-3(EU^)^]  - 4Cni|^-ni2)*^(Pj^-P2)'^(-3EU^EUV+Eu\) 

+6  (m^  -m2)  ■ ^ (Pj^  -P2)  ■ ^ [EU  V-EU^  • EV^-  2 (EUV) 

-4  (inj-m2) (Pj -P2)  ■^(EUV^-3EUV-EV^)+ (P^ -P2) [EV^-3  (EV^) 

= (inj-ni2)''^[/r„(x-in^)'‘dF^(x)-3a'^]-4(m^-m2)‘^(P^-P2)'^- 

[/®_  (x-m^)  (X) -5a\ (E^  -m^)  ] 

-6(ni^-m2)‘^P^-P2)’^/!„Cx-in^)2dF^(x)-  P^a^] 

-12 (m^-m2) ■^(P^-p2) '^pJCE^-m^)^ 

-4(m^-m2)‘^(P^-P2)'^  (l-6P^P2)P^(E^-in^) 

♦(Pl-Pz)"'*  PiP2a-6P^P2) 


2.6  Proof  of  proposition  2.3 

We  lose  the  same  notation  as  in  section  2.3.  By  definition 

Ti  = X '^2^2  ' ^^'*^1  *“2^'"3 


J 
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■ * ‘>l‘>3^P'r'’l3’  * V3^P'2-‘’23’- 

vrfiere  Dj^,  and  Dj  are  defined  as  in  Proposition  2.3. 

By  definition  we  have 

2 2 2 
Var  = a +a)^  +(02(1-0)2)  (m2 -nij)  -2(o^u)2(m^-m2)  (m2-mj) 

■“2^^12'^13^  ■^“l“2^^1l’^13^^^12'^13^ 

Var  Y2,  = P23(l-P23)-a)^(P21-P23)(l-2P23)-a,2(P22-P23)a-2P23) 

■“lf^2l'^23^  ■“2^^22'^23^  ■^“l“2^^2l'^23^ ^^22*^23^ 

Cov(X^,Y^^)  = Pj^2(Ej^3-m3)+(0j^[P^j^(Ej^j^-m^)-Pj^j(E^2-ra2) 

+ (iiil-m3)  (Pj^2^‘P23) 3 '“i  (™i"n'3)  "“2 ^"‘2 ""*3^  ^^12’^13^ 

■*■‘^2  ^^12  ^^12'"‘2^  '^13  ^^13‘"‘3^  * (in2'"*33  (^12'^133 1 
■“l“2*-^"h.’"*3^  (Pj^2'^13^‘^^"^’'"3^ 

Cov(Xj^,Y2j^)  = P23(E23-m2)-(o^(in^-m2)(P22-P23)-w2("'2’"'3^^^22’^23^ 

■^“1 1^21  ^^2l'"'l^  '^23  ^^23'”*3^  * (n'l*ni3)  (P21  ■P233  3 
■^“2  ^^22  ^^22'"*2^  ’^23^^23'’"3^'^^"*2’'"3^  ^22'^23^  ^ 

■“l“2  ^ ^’^■"‘33  ^'^22’^233'^^"*2’™33  ^^2l’^2333 
Cov(Yii,Y2i)  = Pi3P23'^“i(Pii’Pi3)(P21’^233'^“2^^12’^133^^22’^233 

■^“1^^23^^11'^133‘^*’i3^^21'^2333*“2  ^^13^^22-^233'*’^23^^12'^1333 
^V2t(PirPl33  (P22-P233^(Pi2-"333  (^21-^1 

Therefore 
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nD^ar  = D^ar  +D^ar  +D^ar  + ZDj^DjCovCX^.Yj^j^) 

"2D2D3Cov(X^.Y2^)  . ZD^D^CovCY^^.Y^p 

7 2 

can  be  expressed  as  a linear  combination  of  and 

The  coefficient  of  is 

D2(mj-n,3)2.D2(Pjj-Pj3)  (l-aP^jj^D^tP^i-P^j) 

-ZDj^Dz  [P23  ^^2l'^23^  ^ ‘^13  ^^13""*3^ 

+ (m^  -m^)  ^^11  ‘^13^  1 *^^2^5 1-^21  ^21  '"’l ^ ‘^23  ^^23'"'3^  * “"‘3^  ^^21  ’ ^23^  ^ 

■ ”^1  • 

The  coefficient  of  t^z  is 

D^'"2-'”3)^*d2(Pj3-P„)(1-2P„)*d2(P33-P„)(1-2P3,) 

-2DiD2[Pi3(P22'P23)+P23^^12‘^13^^'^^^1®3^^12^^12“"'2^'^13^^13‘"*3^ 

+ (in2-m3)  (P^^z'Pis^  ] '*■202^3 ^^22^^22'"*2^ '^23^^23'"*3^'*‘ ^’"2’'"3^  ^^22‘^23^^ 


" hh  ■ 

2 

The  coefficient  of  u)^  is 


-[D,(m^-m3)*Dj(Pjj-P33)»D3(P2j-P2j)]‘^  = 0 


2 

The  coefficient  of  ^2 


- [Dj  Cjn2-in3)  +Dj^  fPi2'Pi3^  "^^2  ^^22'^23^  ^ ® 


The  coefficient  of  Uj^Uz  is 


- [D2(mj-m2)+Dj  ^^11 ‘^13^ ^^21 '^23^^  t°3^"*2‘"'3^'^^l  ^^12’^13^ 

The  constant  term  is 

““*®h3(l-^3)*'>^23<l-'’23)-Wl3''23-2Wlj(El3-'"3) 
*“2“3‘’23<E23-V  = “stoS'  ' 


OiAPTOl  III 

TESTS  USING  RANK-TYPE  STATISTICS 

In  this  chapter  we  describe  a rank-type  statistic  proposed  by 
Thomas (1969),  in  the  form  used  by  Hariton(1972) , and  modify  it  in 
order  to  apply  to  problans  A2  and  A3.  Next  we  conpare  the  performance 
of  this  modified  statistic  with  Johnson's  statistics  in  the  case  of  two 
normal  con5X)nents  against  a single  normal  alternative.  Then  we  extend 
the  method  to  obtain  test  statistics  for  three  and  four  components 
mixtures,  and  give  an  algorithm  to  derive  test  statistics  for  mixtures 
of  more  than  four  components.  Finjilly  in  section  3.8  we  study  two 
special  cases  of  reducing  the  number  of  conponents  of  a given  finite 
proper  mixtures. 

3.1  Problem  A1 

Suppose  that  — ’^in  ^ ^ random  sample  from  the  c.d.f. 

F^,  i=0,l,2  and  that  all  the  X's  are  mutually  independent.  We  wish  to 
test  the  following  null  hypothesis 

Hp  : Fp  = wF^  + (1  -‘^)F2  properly. 

Thomas  (1969) , assuning  (i)  the  continuity  of  Fq,  Fj^  and  F2,  and  (ii) 
nQ=nj^=n2,  proposed  a statistic  to  test  Hq  and  showed  its  asymptotic 
normality.  Hariton(1972) , assuming  only  condition  (i),  expresses 
this  statistic  as 

’'2 ' *10  * "02  - "u 


where 


II.  n. 


1 i 1 

^ h(X.,-X.  J, 

”i"j  t-i  ifi  “ 


and  h(x)- 


Define 


(1  iJ 

>.(1  n1 


if  x>0 
'•0  otherwise. 


then 


(3.1.2) 


(3.1.3) 


(3.1.4) 


«ij* 

(1)  aj^=l-a^^  (by  continuity) 

(2) 

“ij  ■ [«„-(ni»n.-l)a2..(nj-l)/F2dF. 

(4)  OovfWy,WiP=^;F.F^dF. 

(For  proofs  of  (2), (3)  and  (4),  see  Bimbaum  and  Klose(1957) .) 
Integrating  both  sides  of  the  equation  FQ=a)Fj^+(l-(i))F2  with  respect 
to  Fq,  Fj^,  F2,  we  have 


1/2  = a)aj^Q+(l-(i))a2Q 

“01  * * ^^‘“^“21 

“02  ~ ' 

Solving  for  w in  each  equation  and  equating  these  solutions  by  pairs, 
we  obtain  three  conditions  on  o's.  Hariton  shows  that  these  three 
conditions  are  mutually  equivalent.  Hence  under  Hq  a (single)  nece- 
ssary condition  is  therefore 

°'l0^“02*“l2"^^^*®*  (3.1.5) 

It  follows  that  under  Hq  ^^2=0. 

Let  N=nQ+nj+n2  and  suppose  that  as  N 
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n^/N  -*•  i=0,l,2  and  there  exists  e such  that  0<e<rj|^sl-e<l.  Then  by 

the  following  Theorems  3.1  and  3.2,  T2  has  an  asymptotic  normal  dist- 
tribution. 

Theorem  3.1 

Let  ^ univariate  c.d.f. 's  and  si^pose  that  for 

i=l,2,  — ,k,  a random  sample  } of  size  n^  is  available 

from  F^.  Assume  that  all  X's  are  mutmlly  independent.  Let 
N=n^+ — +nj^,  r^=nj^/N,  i=l,  — ,k,  and  also  assune  that  there  exists  e 
such  that  0<e<r^<l-e<l,  i=l, — ,k.  Then  for  any  integer  m,  the  random 
vector 

N^/^CCW.  . -1/2),  (W  . -1/2),  — ,CW.  . -1/2)) 

has  an  asynptotic  multi -normal  distribution  with  finite  mean  vector  and 
finite  variance -covariance  matrix  provided  i^'s  and  j^'s  are  integers 
such  that  l<i^;^j^s  k. 

Proof:  As  stated  in  Hariton(1972)  this  follows  from  Theorem  5.5.1  of 
Puri  and  Sen  (1971) , pp.  196  by  setting  c=2  and 

Theorem  3.2 

Let  X be  a kxl  vector  of  random  variables,  and  let  {X^^\ — X^^^, 
— } be  a sequence  of  vectors  of  random  variables  such  that  X^”^  -*■  X 
in  distribution.  Let  gj^,  — ,gj^  be  continuous  functions  defined  on  R . 
Then 

(gj^(X^^b.---g^(X^’^b)  -►  (giCX),---g^CX))  in  distribution. 

Proof:  See  Breiman  (1968),  pp.  237. 

VarT2  can  be  derived  by  using  the  formulae  in  (3.1.4).  But  since 
terms  in  (3.1.4)  involve  unknown  distributions  F^,  F^^,  F2,  these  terms 
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can  only  be  estimated.  One  possible  way  is  to  use  the  following  extended 
theorems(see  Hariton,  1972)  of  Woinsky  and  Kurz  (1969,  pp.  447): 


n n,  n 
a b c 


'Wc  i-l  j-l  k^l  * ! Va'lPc  “ 

”b  ”b  ”a 


in  probability.  (3.1.6) 
Denote  the  (consistent)  estimator  of  Var  T2  obtained  in  this  way  by 
Var  T2.  Hariton  then  obtains  the  following  large  sain)le  test 

Reject  H(,  if  IT^ | (v2r  , (3.1-7) 


where 


is  such  that  4>(Zj^_^2)“l"“/2* 


3.2  Problem  A2 

Si^jpose  now  that  F2  is  known  and  are  random  san^Jles 

from  F^,  i=0,l  with  all  the  X's  mutually  independent.  Assume  that  F^, 
F^,  and  F2  are  continuous.  In  order  to  test  the  following  null  hypothe- 
sis 

we  define  random  variables 
Ri-  = /F^_(x)dF.(x) 

1 

=1-  ^ 

”i  k=l  ^ 

vdiere  F^  is  the  e.d.f,  corresponding  to  X^j^,  — ,X^  . Note  that 
^ij~“ij'  ® statistic 

^3  “ "^10  ^2  ' ^2  ■ 


(3.2.2) 
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where  Wj^q  is  defined  by  (3.1.2). 
Under  H^, 


^3  ‘“lO  '“l2  ■ 1 “ ® • 


Var  R^2-  ERj/  - aj/ 

- ^ * k^xJdFjtx)]  , for  1-0,1.  C3.2.3) 

Cov(Ro2»Ri2)  = 0 

Cov(Kjo.Ri2)-e{[  5^  ^ jj  FjOCi^Jj}  -Oio“  12 


1 rl  rO  rl 


^ T^-K  -S  lii  -«10“: 

nQnj^i=l  j=l  k=l  •' 


“lO'  “l0“l2' 


^o”i 


2 ^^0"l 


(l-FQ(x))F2(x)dF^(x)+n^nQ(n^-l)a^Qa2^] 


(3.2.4) 


■ -“01“21  * |FoWF2(=')'>^iW1 

Cov(W^g,Rg2)  - H^[“io“20  ‘ |f2Cx)F2  ’ 

and  VarW^P  follows  from  (3.1.4). 

Var  Tj  can  then  be  calculated  as  a linear  combination  of  the  above 
terms.  As  in  section  3.1,  it  is  desirable  to  have  a consistent 
estimator  for  Var  T^,  so  that  large  sairple  tests  can  be  applied.  For 
this  purpose,  we  need  the  following: 

Proposition  3.1 

(1)  Ri2  -^>  0^2 

"0"l  1=1  j=l  ''%'’'ol'''2t’‘lj'  |F|,(x)F2Cx)dFj(x) 
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t5)  ir  .bp2»ai>l^ 

a 1=1  ' 


m -f  j;  ij  j>»ok-=^i)''(’‘ok-=^j’  1 


as  iIq,  n^ ^>  «>. 

Proof:  (4)  follows  from  (3.1.6), 


Since  ER^2  ““i2  ’ 


aCXy-X^jjF^ffy)- 


Fp(x)F2(x)dFj(x)  for  j=l,---,  n^;  i=l,---,np. 


E[F2(X^i)]2  = jF2^(x)dF^(x)  , a=0,l. 

(1)  and  (3)  follow  from  the  weak  law  of  large  numbers.  (Since 
Rq^  is  a mean  of  i.i.d.  random  variables).  Uhder  the  assumption  of 
continuity,  after  rearranging  the  terms. 


jj  = {F(^^(x)F2(x)dFj^^(x)  . Since 

^or,  Eo(x)  a. 4.  (by  the  strong  law  of  large  numbers,  and 
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continuity  of  F^(x)),  \ ‘ j^o  I - 

+ ||Fo(x)F2(x)dF^Jx)  -|Fg(x)F2(x)dF^(x)  | 


n. 


1 


" I HT  .r  Fo^^li5P2^1i^  - r FoWF2WdFi(x)| 

i l-i  ^-00 


0 a. 6.  (by  the  strong  law  of  large  numbers)  q.e.d. 

After  rearrangement. 


Var  T-  = -i 
3 n, 


0' 


[F^(x)-F2(x)]2dFQ(x)  + ^|[FQ(x)-F2(x)]2dF^(x) 
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fi 


j^[|Ff(x)dF„Cx)  . |F/(x)dFj(x))  - 


Therefore  a consistent  estimator  for  Var  is 


n,-l 


3'irK-t-r-  ''0'orti»'«orti» 

‘'o"!  n,  V i-1  j=l  k-1  ^ 

n^“l  1 iipj  n, 

* f .f  j!  ''0(ik->'oi)''Pik->‘oj 

0 1 Hq  n^^  1=1  j=l  k=l  •' 


Var  T,  = 


)] 


+ — [—  F,^(Xfx-)]  + 


*^1  "l  i=l  ^ 


1:'  sir  1 j? 


n,  n 

I r 

‘0  “0“1  i=l  j=l 
n„  n 


Oj 


Let  N=nQ+nj^,  and  suppose  that  as  N , n^/N  -*■  , i=0,l,  and  there 


exists  e such  that  0 < e < < 1-e  < 1 . Then  by  Theorems  2. 1,2. 2, 


3.1,  and  3.2,  under  H„ 


,-1/2 


TjC  Var  Tj)  ' > N(0,1)  in  distribution.  Therefore  a large  sample 

test  can  be  formulated  as  follows: 


,-1/2 


Reject  Hq  if  jT^jCVar  T^)  ■ > 

where  22-01/2  '^^^1-0/2^“ 


(3.2.6) 


3.3  Problem  A3 


Suppose  that  and  F2  are  known,  and  that  Xj^,  X2,  — , is  a 
random  sanqjle  from  F^.  Also  assume  that  F^,  Fj^  and  F2  are  continuous. 
The  null  hyjxjthesis  is 

Hq:  Fq(x)  =wFj^(x)  + (1-  03)  F2(x)  properly 
First  define  random  variables 


jF^(x)dF^(x) , i=l,2,  where  Fj^(x)  is  the  empirical  distribution 
function  corresponding  to  Xj^,  X2,  — , X^. 

Define  the  test  statistic 

”^4  ^0  ^02  ’“12  ' 2 • (3.3.1) 

Then  following  an  argument  similar  to  that  in  section  3.3,  when  is 

true,  we  have 

^4  " “10  "^“02  ‘“12  ■ I • 

Var  Rj^q  and  Var  Rq2  can  be  similarly  derived  as  in  (3.2.3)  by  changing 
indices.  We  have 

Cov(Rio,Ro2)  ""  ■*-°^*^^01’^02^ 


■ i j W])- 

'■  1=1  1=1  ■' 

= ^“10  ^«20  ■ ^ “0l“i 

n 1=1  j=l  ■’ 

= ^Koa2o  - |F^(x)F2(x)dFQCx)]  (3.3.2) 


It  follows  that 

Var  T4=  Var  R^q  + Var  Rq2  + 2Cov(R^0,Rq2) 


" ? ('Ko  ■ “20^^  ^ I t^lW  - F2(x)]2dFQ(x)}  (3.3.3) 
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Hence  a consistent  estimator  for  Var  T.  is 

4 

var  r i j^lFiOcp-FjOcpj^}  (3.3.4) 

By  the  central  limit  theoran  and  Theorems  2.1,  2.2,  3.1  and  3.2, 

T^CVar  T^)  and  T^CVar  T^)  have  a same  asyn5)totic  distribution, 
viz.,  a univariate  normal  distribution  N(0,1).  Thus  a large  sain)le  test 
can  be  formulated  as  follows: 

Reject  Hq  if  [T^j  > 

where  '*’^^1-0/2^  “ l-ct/2  . 


3.4  Two  Normal  Components  vs.  a Single  Normal  Alternative 

In  this  section  we  study  the  performance  of  T^  with  respect  to  the 
same  hypotheses,  Hq,  Hj^,  as  in  section  2.2  and  then  compare  its  approximate 
power  with  other  tests.  Assume  that  ~ N(m^,a^),  i=l,2.  The  hypotheses 

to  be  considered  are 
HqI  Fq  = a)Fj^  +(1-  (i))F2  properly 
Fq  ~N(y,T"). 

By  definition 

^4  " ^02  ■ “l2  ■ i 


where  is  defined  by  (3.2.1)  anda,,  by  (3.1.3).  It  follows  that 


^4  ■ “lO  * “02  ■ “12 


12 

and 


nVar  T^  = - (a^p  - + f[Fi(x)  - F2  (x) ] ^dFp (x) 


Under  H, 


0 


ECT^lHp)  = 0 

nVar(T4|Hp)  = - (i  + |[F^(x)  - F2  (x) ] 2dF2  (x) 


hi) 


[Fj(x)  - F2(x)]^d[F^(x)  - F2(x)] 


(3.4.1) 


If,  furthermore,  we  let  m2=-mj^,  then  the  last  term  on  the  right  hand 

side  of  (3.4.1)  vanishes.  Therefore 

nVard^lHp)  = -(i  - + |[F^(x)  - F2 (x) ] 2dF2 (x) 

1 ^ x-ra^  x+ffl^  - x+m, 

= -[  i + [^(-5-^)  - $(-5-^)]  <()(-5^)dx  (3.4.2) 

J —00 


where  (J)(x)  is  the  derivative  of  4(x)  and  4»(x)  is  the  cumulative 
standard  normal  distribution.  Note  that  in  (3.4.2)  nVar(T^  Hp)  does  not 
depend  on  w.  A large  sample  test  can  be  formulated  as  follows: 

Reject  Hp  if  | [Vard^  |Hp)]'^/2  ^ ^3  ^ 3^ 

vdiere  ^^^l-a/2^  ^ asymptotic 

significance  level. 

IMder  Hj^ 


p-m,  -U-mi  .T  1 

EfT.lHj)  = 4(— ^)  + 4>(—  ■ -)  - $(-v^,a  ^)  - j 


and 


nVar(T4|H^)  = -[<{>(; 


y-m,  y+m,  , 

i)  - $( „ 

/a^+T^  /o^+T^ 


J. 


00 


x+m, 

0 T ' X 
(3.4.4) 


x-m,  x+m,  - 
[<I,(_^)  - $(^)]2^(X^)dx 


For  a function  f(z)  having  no  pole  between  the  real  axis  and  the 
lines  z=±iTrh,  it  can  be  shown  by  contour  integration  that  (Goodwin,  1949) 


J -00 


f(x)e  ^ dx  = h ^ f(nh)e'^  ^ + R(h) 

n=-oo 


(3.4.5) 


where  |R(h)  | < 2Ae’’^^^^^  . 

We  first  use  (3.4.5)  with  h=l  to  calculate  the  integrals  in 
(3.4.2)  and  (3.4.3)  for  various  values  of  mj^,p,T  with  a^=l.  When  h=l 

|R(1)|  s 2/“  e‘'^^=.  00018  , 
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i.  e.  the  approximation  (3.4.5)  is  accurate  to  3 decimals  with 


maximal  error  ±.00018.  From  these  values  we  then  obtain  the  approximate 
power  of  the  test  (3.4.3)  with  respect  to  the  alternative  hypothesis 
from  the  formula 

-Zj.^2[Var(T,|H„)]l/^-ECr4|Hj)^ 

$ ( - 1/7  ) ■*■  4 ( -xjj  ^) 

[Vard^lH^)]^'^^  [VarCT^lH^)]^/^ 

(3.4.6) 

Table  3.1  contains  the  approximate  power  of  the  test  (3.4.3)  for 
5%  significance  level  with  a^=l,  m2=-mj^,  and  various  values  of  m^^.y,  and 
T.  Comparing  Table  3.1  with  Table  2.3,  we  see  that  for  the  case  of  two 
nonnal  components  versus  a single  normal  alternative, the  test  using  T^ 

(i.e.  (3.4.3))  is  more  powerful  than  the  test  using  (i*  e.  (2.1.3)), 

especially  when  t=1,2.  Next,  comparing  Table  3.1,  for  n=100,400,  with 
Johnson's  table  Ilb  (1973,  pp.24)  which  uses  another  statistic,  we  see 
that  in  most  places  the  power  is  fairly  conparable  except  that  when 
t=2,  mj^=l,  n=100,  the  test  using  T^  is  considerably  more  powerful. 


3.5  Problems  Al',  A2',  A3' 


Three 


anents 


In  this  section,  the  following  null  hypothesis  will  be  tested: 

Hp*.  Fq(x)=  (DjF^(x)  + u)2F2W  * (1-u)j^-u)2)F3(x)  properly.  (3.5.1) 

Let  be  a random  sanple  from  Fq  and  suppose  that  each 

of  Fj^,  F2,  Fj,  is  either  known  or  there  is  a random  sample  from  it. 

Also  assume  that  all  the  r.v.'s  in  the  samples  are  mutually  independent 
and  Fp,  F^^,  F2,  Fj  are  continuous. 

Integrating  both  sides  of  (3.5.1)  with  respect  to  F^(x),  for 
i*0,l,2,3,  we  obtain 
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i " ‘^“10  * “2“20 

Om  _lu).a)a  -n  u 

" 2 1 2 21  1 2^  31 

“02  " ‘*^“12  ^ ‘^‘  “2^“32 

“03  = ‘^“13  ^ i “23  “l  ■ “2^1 
where  is  defined  by  (3.1.3).  We  introduce  the  symbols 

\ * “Oi  ■ “31 

®i  ■ “ll  ■ “31 

'"1  “ “21  ■ “31 

for  i=0,l,2,3.  After  rearrangonent , the  above  equations  can  be  written 
as: 

^0  = <^^0  ^ “2^0 


+ u)2C^ 


^2  = '^^2 


(3.5.2) 


A3  “i®3  * “2^3 

A necessary  condition  for  (3.5.2)  to  have  a unique  consistent 
solution  for  is  that  any  three  of  the  four  equations  in 

4.  L 

(3.5.2)  have  a consistent  solution  for  {a)j^,u)2}  . (A  consistent 
solution  is  a solution  that  simultaneously  satisfies  the  designated 
set  of  equations.)  In  other  words,  each  of  the  following  four 


determinants  has  value  zero. 


\ ®0  ^0 


h ®1  ^1 

^ h ^2 


^0  ®0  ^0 


®1  ^1 


®3  S 


'0  "0  S 


B2  C2 


^ Bi 

^ h ^2 


®3  ^3  , 1^  h 


H 

h 

h 

[.  ! 
t.  ; 

[' 


I , 
\ 
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Note  that,  by  definition, 

- Aq  = Bf  - Bg 
^ ■ ^0  “ ^2  ■ ^0 

from  which,  our  conditions  can  be  simplified  as  follows: 

(Cl  - Cg  - Aj^  + Ag)[BgC2  ' (Cg  ' Ag ) * AgB2]  = 0 

(Ag  - Aj)[BgC2  - Bj(Cg  - Ag)  - AgB2]  = 0 

(Ag  - A2)[BgC2  - Bj^CCg  - Ag)  - AgB2]  = 0 

-(Bz  - ■ Aq)  - AgBzl  = 0 . 

Suppose  that  BgCz  - Bj^(Cg  - Ag)  - AgBz  ^ 0 . Then 

^1  ■ ^0  ■ ^0  " ° 

Ag  - Aj^  = 0 

Ag  - Az  = 0 

B2  - Bj  = 0 , 

it  follows  that  Ag=A^=A2;  Bg=B^=B2;  Cg=C^=C2.  In  turn,  these 
give  BgCz  - Bj^(Cg  - Ag)  - AgBz  = 0 , which  is  a contradiction.  Hence 
we  must  have  BgCz  ' ®1^S  ' ' ^0®2  " (3.5.3) 

This  is  therefore  a necessary  condition  for  (3.5.2)  to  have  a unique 
solution  for  {ajj^,a)2}  . 

The  reasons  that  (3.5.2)  must  have  a unique  consistent  solution  for 
(w^j  1^2}  are  discussed  in  the  following: 

If  there  exist  two  sets  of  solutions  for  (3.5.2),  say  {a)j,u)2}  and 

{a)j^',a)2'}  , with  0 < 0)^,0  < 0)2, 0 0 ^ ujz',  a)j^+a)2<l, 

(1)1  +(4)2' < 1>  and  either  , or  Uz  ^z',  then  we  have 

FgW  = a)j^Fj^(x)  + a)2F2(x)  + (1-  - a)z)F3(x) 


- a),'F^(x)  + a)-'F-(x)  + (1-u  a)-')F_(x)  for  all  x 
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which  implies  that 

0=^-a)j')[Fj^(x)-F2(x)]  + ((*>2-032')  [F2  *• 

From  this  , it  will  be  shown  that  the  existence  of  two  non- 
identical solutions  {o>j^,a>2},  {o)j^',(D2'}  reduces  the  problan  of  testing 
a mixture  of  three  ccaiqwnents  to  one  of  testing  two  components.  This 
has  already  been  discussed  in  sections  3.1  to  3.3,  and  is  not  our 
present  interest. 

(1)  If  a)j^=a3j^*,  (i>2  ^ 0^2' > bhen  F2(x)hFj(x),  which  implies  that  under 
Hq,  Fg(x)  = ooj^Fj^Cx)  + (1-  a)j)F2(x),  a mixture  of  two  ccmponents. 

(2)  If  03j^  o)j^',  o>2  = 032' » then  similarly  FqCx)  =a)2F2f^)''’C^"‘*‘2^^3^^^’ 

a mixture  of  two  conponents. 

(3)  If  osj  o>2'>  (*>2  o32'> 

(a)  (a)j^  -o)j^')Co>2  - (*>2')  > 0,  then 

0)-.  - 03-1  ' 03-7  (*3.7 1 

F,(x)  = A iFi(x)  ^ --  A iF,(x)  , 

3 03j^- 03^+032-032  1 OSj^  _ 03  +032*032  2 

i.  e. , FjCx)  is  a proper  mixture  of  Fj^Cx)  and  F2(x). 

(b)  (o>2  -o32')(o32-  032')  < 0 and  |o>j  - 03^' | < [032'  032'! 


03-, 


F.,(x)  = ^ 

2 03„ 


P rxW  ^ p . , 


“2  - '*'2  " '*'2  - “2 

i.  e.,  F2(x)  is  a proper  mixture  of  Fj^(x)  and  Fj(x) 

(c)  (o3j^  - o3j^')(o>2  - 032')  < 0 and  |o3j^  - 03^^']  > |o32-  032' 


03o 


0>7  03,  03,'+  03,-037' 

F,  (x)  = — ■■  , F,(x)  + - — F.(x)  , 

1 03j^  - 03j^  2 03j^  _ 03j^'  3 

i.  e. , Fj^(x)  is  a proper  mixture  of  F2(x)  and  F^Cx). 

(d)  o>j^-  o>j^'  = 032'  - o>2  » then 
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vdiich  implies  that  (x)  SF2  (x) . 

In  each  of  the  above  cases  (a)  to  (d),  Fq(x)  can  be  reduced  to  a 
proper  mixture  of  two  components.  For  example,  if  Fj(x)=(i)Fj (x)*(l-a))F2(x) , 
then  Fg(x)=ajj^Fj^(x)+w2F2(x)+(l-a)j-a)2)Fj(x) 


= [(i)j^'*'a)(l-(i>j-(D2)]Fj^(x)  + [a)2+(l-a3)  (l-a)j-a)2)]F2(x) 

a proper  mixture  of  Fj^  and  F^.  In  conclusion,  for  F^  to  be  a genuine 
mixture  of  three  components,  (3.5.2)  must  possess  a unique  consistent 
solution  a necessary  condition  for  which  is  (3.5.3). 

Rewrite  (3.5.3)  in  terms  of  a's 

0 = (1/2-ao^)  (1/2-032)+ (1/2-03^(1/2-020)+ (1/2-030)  (l/2-a^2) 

(3.5.4) 

Our  test  statistics  will  be  derived  frcrni  (3.5.4).  First  let  us  consider 
problem  Al.  Besides  the  randcsn  sample  frcrni  Fo,  we  have  for  i=l,2,3,  a 
randcmi  sample  — »^in.  ^i' 

T3  = (l/2-WoP(l/2-W32)+(l/2-W3p(l/2-W2o)+(l/2-W3o)(l/2-W^2^ 

(3.5.5) 


where  is  defined  as  in  (3.1.2).  IMder  Ho,  Choose  e such  that 

n. 

n +Ty  .1^  =p.<l-e<0,  i=0,l,2,3,  then  by  Theoran  3.1 
% 1 2‘^3  ^ 


((Woi-1/2) , (W32-I/2) , (W20-I/2) , (W3^-1/2)  , (W^2‘1/2)  , (W30-I/2)) 
has  an  asymptotic  normal  distribution  with  mean 

( (aoi-1/2) , (a32-l/2) . (ci2o'^/2) , (o3^-l/2) , , (a3o-l/2)) 

(3.5.6) 

and  variance -covariance  matrix  N((a-)),  i, 3=1, 2, ---,6,  where 
N“no+nj^+n2+n3  and  Oy  are  listed  below.  The  function  g(x^,  — ,Xo) 

“^1^2'^^3*4‘^^5^6  continuous  on  R^,  hence  by  Theoran  3.2,  NT^  has  an 
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asyirptotic  distribution  which  is  that  of  Z^Z2*Z^Z^-*-Z^Z^  with  (Z^  — ,Z^) 
having  a 6-variate  normal  distribution  with  mean  vector  (3.5.6)  and  variance- 
covariance  matrix  ((a^j))  as  follows: 

”11  ■ Po'l/fi®o- 


“22  ■ *P2‘t/i"2-“32' 

“33  ■ P2^1/Fo®2-“Li  "Po‘l/F|dFo-“L' 
“44  ' P;^I/Pl‘ip3-“uI  *P;^!/F|'«'r'>3ll 

“55  ■p;*1/fK-“2i1 


O45  = 

^^46  “ P3  ^^■^0^l‘^3'“03“l3^  (3.5.7) 

where  p^=n^/N. 

In  the  following  we  will  derive  the  asymptotic  variance  of  T^. 

Let  u'  denote  the  r-th  moment  about  zero  and  < denote  the  r-th  cumulant. 
r r 

2 2 4 

Then  = K4+4<2Kj^+3k2+6k2Kj^+Kj^  ■ Denote  '^1111 

can  be  derived  formally  as  follows  (David  et  al.  1966): 


Write  P4  formally  as 

y'(r'^)  = K(r'^)+4K(r^)K(r)+3(ic(r^))^+6<(r^)(K(r))^+(ic(r))'^ 
and  operate  with  s^  on  both  sides,  after  cancelling  the  factor  4 on  both 
sides,  we  have 

p'  (r^s)  = K(r^s)+3<(r^s)K(r)+K(r^)K(s)+3K(r^)K(rs)+3K(rs) (K(r))^ 

+3K(r^)K(r)K(s)  + (K(r))\(s). 

3 3 

Next  operate  on  both  sides  first  with  t-^,  then  with  v|p  . After  cancelling 
certain  factors  on  both  sides  and  putting  r=s=t=v=l  , we  have 

^illl  " '^iiii‘^'^oiii'^iooo'^'^iiio'^oooi'^'^iioi'^ooio'^''ioii'^oioo'^''ooii'^iioo 
■^'^lOlO'^OlOl'^'^lOOl'^OllO  ■*’'^0110'^1000''000l‘^'^010l'^1000'^0010 
■^'^IIOO'^OOIO'^OOOI  ■^'^OOll'^lOOO'^OlOO'^'^lOlO'^OlOO'^OOOl 
■^'^lOOl'^OlOO'^OOlO  ■^'^lOOO'^OlOO'^OOlO'^OOOl  ’ 


where  is  the  4-variate  cumulant.  Since  (Z^^, 22,2^,24)  has  a 4- 

variate  normal  distribution,  we  have 
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Therefore 

■l/2)+a^^  (ot^^  -1/2)  (n^^-1/2) 
(a2j^-l/2)+aj^^(a22-l/2)  (a2Q-l/2) 

+(ao^-l/2) (020-1/2) (032 -1/2) (a3^-l/2) 

Since  E(Zj^Z2)=aj^2“^  » ^^^3^4^“‘^34~®  » Cov(Zj^Z2,Z3Z^)=E(Zj^Z2Z3Z^) . 

Similarly  we  have 

Cov(ZiZ2.Z5Zo)=E(ZiZ2Z5Zo) 

°15°26'^'^16‘^25'^'^25^“0l’^^^^^“30‘^^^^'^'^26^“0l‘^'^^^  (aj^2‘^/^^ 

'^‘^15 ^“32"^^^^  ^2’^'*^^^ 

+ Caoi-l/2)  (a^2’l/2)  (032-1/2)  (cxjq-I/Z) 

and 

'^35‘^46'^'^36‘^45'^°45^“20'^^^^  ^“30’^'^^^'*’'^46^“20’^'^^^  (aj^2‘^/2^ 

+ (020-1/2)  (o3^-1/2)  {o.^^-1/2)  (O30-I/2) . 

By  the  same  method  it  can  be  shown  that 

Var(Z^Z2)  = [o^i+(ooi-l/2)2] [a22+Co32-l/2)^] , 

Var(Z3Z4)  = [o^^*i<x^^-l/2)^][a^^*ia^^-l/2)^] , 

VariZ^Z^)  = [a55Ho,2-l/2)2]  [000^(030-1/2)2] . 

Finally  the  asynqjtotic  variance  of  NT^  can  be  derived  from  the  following: 

Var (2,22-232^.2520) 

= Var (2, Z2)+Var(Z3Z^)+Var(Z5Zo)+2Cov(Z,Z2,Z3Z^)+2Cov(Z, 22,2520) 


+20^(232^,2520). 


(3.5.8) 


i 

From  the  above  argument  we  see  that  the  asyn^totic  variance  of 
(i)  is  not  so  easily  derived,  and  (ii)  has  nonlinear  terms.  And  when 
we  COTie  to  derive  ;i  consistent  est  inutor  tor  it  , uv  oei  iuvoUr  in  iou-. 

work  and  have  to  work  out  various  estimators  for  different  terms  in  (3.5.8).  • 

The  reason  for  this  tedious  derivation  is  that  has  terms  which  are  products 
of  random  variables.  If  we  look  again  at  the  definition  of  T^,  i.e.  (3.5.5), 
we  find  that  the  only  possible  situation  in  which  is  a linear  combination 
of  random  variable  would  be  when  all  the  coiT5)onents  Fj^,  F^,  F^  are  known — 
viz.  problan  A3'.  In  this  latter  case  we  can  modify  to 

T5  = (l/2-RQp(l/2-a32)  + (l/2-o(3;i)(l/2-R2o)+(l/2-R3Q)(l/2-ct^2^ 

(3.5.9) 

where  is  defined  as  in  (3.2.1). 

Under  Hq,  ^13=0, 

n^Var  = - [- (l/2-a32)oio'*’Cl/2-a3j^)a2Q+(l/2-oi^2^“30^ 

+/[-(l/2-a32)V<^l/2-a3pF2  +(l/2-a^2^p3l‘^0  ' 
from  which  a consistent  estimator  for  n^Var  is  therefore 

noV^r  - -[-a/2-a32)Ri„*(l/2-aji)R2o.(l/2-ai2)R3ol^ 

. z”  [-a/2-a32)FitXoi)*(l/2-a3i)F2CX  )ni/2-c.i2)F  OCgPl^ 

1 = 1 

(3.5.11) 

Since  has  an  asynqjtotic  normal  distribution,  a large  sanqile  test  can  be 
formulated  as  follows: 

Reject  Hg  if  |Tj  | (Var  ' 


3.6  Problgns  Al*.  A2'.  A3' Four  components 

Suppose  that  ^ random  sample  from  F^^,  i=0,l,2,3,4. 

Assume  that  all  X's  are  mutually  independent  and  all  F^  are  continuous. 
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We  wish  to  test  the  following  null  hypothesis: 

' ^0  " ‘^^l  *^2^2  “3^3  ^^■“l‘‘^2'‘^3^^4 

Integrating  both  sides  of  with  respect  to  F^,  we  have 
Q^i  = * “2“2i  * ‘^“3i  * ^^■‘^'‘^2'‘^^“4i  ’ i=0.1»2,3,4,  (3.6.1) 

vdiere  is  defined  by  (3.1.3).  Denote  by 

h ' “oi  ■ “4i 
®i  ‘ “11  ■ “41 

(3.6.2) 

^i  " “2i  ■ °‘4i 
h " “3i  ■ «4i 

i=0,l,2,3,4.  (3.6.1)  can  be  rewritten  as 

Ai  = + a)2C^  + “3®i  i=0,l,2,3,4. 

Following  the  same  argument  as  in  section  3.5  by  setting  each  of  five  4x4 
determinants  to  zero,  we  have 

0-[  CAj-A,,)  tCj-Cp-  (A^-Ajj)  (Dj-Dj)-  (Aj^-Ad)  CCj-Cp]  [-BjD2*C2  (Dl-D3)'‘DjB2] 
0-  [ (A^-Ag)  CBj-Bj)  - CB„-Bp  CD^-Dj)  - (Aj-A^)  (B^-Bp  ] [C^Aj-D,  (A^-Ap  -A^C,] 

»-[  f^-V  1 tW’if^rV' Vii 

0=  [ (Dj-Dj)  CAj-Aq)  - CAj-Ap  (Dj-Dj)  - CD^-D,)  (B^-Bp  ] [-BjA^-A^B^-C^  CA^-Ap  ] 
(N  [ (Dj-Dp  (A^-Aj)  - {Aj-Ap  (Cj-Cj)  - (Aj-Ap  (03-03)  ] [ (Bj-O,)  (C„-Ap 

- (C3-A3)  (B(|-Ap* (Cj-Aj)  (Oj-A^) ] (3.6.3) 

By  definition  we  have  the  following  relations: 

^O"'^  " ®0'®1 

" °0‘°3 


^1*^2  ~ ^1*^2 
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B1-B3  = D^-D3 


^2"^3  ^2"^3 


From  these  relations  it  follows  that  the  first  factor  of  each  equation  in 
(3.6.3)  is  equal  to  each  other.  Suppose  that  this  comnon  factor  is  not  zero, 
then  the  following  equations  must  hold  simultaneously: 

AqB3+D3(A^-Aq)-B^A3  =0  (3.6.4) 


(B3-D3)(Cp-Ap)-(C3-A3)(Bo-Ao)HCi-A^)(Do-Ao)  = 0 . 

But 

(A3-A0)  (C^-C2)-  (A2-Aq)  (Dj^-D3)-  (Aj^-Aq)  (C3-C2) 

= [-B^D2.C2(D^-D3)-D3B2)  - [C2A3-D3(A2-Ao)-AqC3] 

. [ (B3-D3)  (Cq-Aq)  - (C3-A3)  (Bq-A(,).  (C^-Aj)  (Dg-Ap) ] 

which  is  equal  to  zero  by  (3.6.4),  and  this  is  a contradiction  of  our 


previous  assumption.  Therefore  when  Hq  is  true  we  must  have 
(A3-Ao)(C^-C2)-(A2-Ao) (0^-03)- (A^-Ao)(C3-C2)  = 0 . 

Or  in  terms  of  a's 


(3.6.5) 


“30“2l'^“M“02'*’“32“i0‘“40“2r“4l“62"“42“i0““40“l3‘“4l“30‘“43“l0 


■“40“32‘“42“63'^“43“62'“4l“23‘“42“n'^43“h  ^ ° 


(3.6.6) 


where  aj^j  = 1/2-ot^j . Thus  an  appropriate  test  statistic  can  be  defined  as 
follows: 
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T — W 4.w>  W 4-W*  W*  .W  W*  .W  W -W  -W  Itf>  -W*  w 

^6  " "3o”21  "31%2  ”32  10  "40"21  '^41  02  "42  10  ”4o"l3  *^4l‘''30 


-W»  W -W»  U*  -W*  W*  4.1*1'  W -W  W -W  W 4.W  w 

"43"l0  "40*^32  "42”03  ”43"02  "4r23  "42*^31  "43^21 


(3.6.7) 


vtfiere  W!^j=l/2-Wy,  and  is  defined  by  (3.1.2).  Ihder  Hq,  ^^=0 

(by  independence).  The  asyn^jtotic  distribution  of  T can  be  derived  sirai- 

6 

larly  to  that  of  as  in  section  3.5,  but  it  is  difficult  to  identify 

unless  all  the  canponent  distributions  are  knovn viz.  we  are  dealing 

with  problan  A3' . In  this  latter  case  a modified  statistic  would  be 


^6  “ ^0“'2l'^“'31^02’^40“'2l’“'41^02’“42^0’^40“i3*“k^0’“'43*H0 


■^“32^0'^40“32"“42^03'^“43^02"“4l“23"“42“k'^“43“n  (3.6.8) 

where  R!^j=l/2-Rj^j  and  R^^  is  defined  by  (3.2.1). 

Under  Erg=0, 

^6  " ■[^“10''^20'"'^“30''^“40^^  /[aF^+bF2+cF3+dF^]2dFQ 
where  (3.6.9) 

^ " “32  ■ “42  ■ “43  ’ ^ " ‘“k  “41  * “43 


them  is  either  known  or  to  be  estimated  from  a given  san^Jle.  Denote  by 
“01  ‘ • 

Assume  that  both  F^,  are  continuous.  Under  Hq  we  have 

-1/2=0  (3.7.1) 

We  may  use  the  statistic 
1*01-1/2 

to  test  Hg,  where  Wg^^  is  defined  by  (3.1.2). 

In  Table  3.2,  the  necessary  conditions  derived  from  previous  sections 
when  testing  mixtures  of  2,  3,  and  4 con^xsnents  are  tabulated.  For 
completeness , we  also  include  k=l  by  regarding  it  as  a ’one  component 
mixture’ . These  conditions  are  readily  seen  to  follow  a certain  pattern, 
such  that  the  condition  of  k components  can  be  derived  directly  from  that 
of  (k-1)  coirponents  without  going  through  tedious  manipulation.  The 
rules  for  such  derivations  can  be  summarized  in  the  follwoing  algorithm: 

1.  For  k even  (integer),  generate  new  terms  through  substituting 

the  index  i by  k every  where  in  the  necessary  condition  of  (k-1)  component, 
i=0,  1,  — , k-1,  then  subtract  from  the  (k-1) -component  condition  all  the 
new  terms  generated. 

2.  For  k odd  (integer),  first  group  all  terms  in  the  condition 

of  (k-1) -components  in  the  manner  shown  in  Table  3.1,  (This  can  also  be 
done  by  preserving  the  order  of  terms  when  (k-1) -component  condition  is 
derived  from  that  of  (k- 2) -components.)  Next,  multiphy  each  group  by 
“ik'  ^ index  missed  in  the  group  (each  group  has  exactly  one 

such  index).  Then  substract  all  (new)  terms  generated  from  that  of 
(k-l)-cc«nponents.  A (conjectured)  5-component  condition  is  derived  in 
this  way.  It  only  a conjecture  and  has  not  been  derived  analytically. 


Table  3.2 
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I 

I 


i 


I 


I 

I 


I 


Note  that  for  the  case  of  k-con^Kjnents,  vrith  k=2,3,or  4,  the  necessary  ' 

condition  is  one  that  includes  all  the  possible  combinations  of  pairs  : 

i 

of  different  indices  out  of  the  (k+1)  indices,  0,  1,2,  — , k.  This  ; 

' J 
j 

can  also  be  seen  from  the  last  column  of  Table  3.2. 

I 

I 

3.8  Problem  El  | 

In  this  ;'ection  we  study  the  problem  of  reducing  the  number  of 
components  of  a given  finite  proper  mixture  by  considering  two  special 
cases,  viz.,  that  of  reducing  from  (1)  four  components  to  three,  and  i 

j 

(2)  three  components  to  two.  The  methods  used  to  derive  the  test 

i ^ 

statistics  in  both  cases  are  the  same;  they  are  included  here  for  the 
purpose  of  completeness,  and  for  the  reason  that  if  these  two  tests  are 
applied  consecutively,  we  would  be  able  to  test  for  reduction 

from  four  components  to  two  components.  i 

I 

In  each  case,  we  first  derive  from  the  null  hypothesis  necessary  I 

1 

conditions,  and  then  test  statistics.  As  before,  we  may  suppose  that 
each  COTiponent  distribution  is  either  known  or  there  is  a sample  from  it. 

If  not  all  component  distributions  are  known,  the  asymptotic  distribution 

of  the  test  statistic  is  difficult  to  identify;  so  from  henceforward 

we  will  suppose  that  each  component  is  known.  i 

3.8.1  Reducing  Four  Components  to  Three  , 

Let  be  a random  sample  from  and  suppose  that  Fj^,  F^,  j 

Fj,  F^  are  known,  continuous.  Also  assume  that  Fq  is  continuous  and  that 

f^l^l  * ***2^2  * *^^3  '^1  "'**2  (3.8.1) 


We  wish  to  test  the  following  null  hypothesis: 

Hq:  1 - - (1)2  ■ = 0 

Define  a—  as  in  (3.1.3)  and  A^,  C^,  as  in  (3.6.2).  IMder  Hq, 

we  have 

Fq  = +(1  - ojj^-  a)2)F3  properly.  (3.8.2) 

Integrating  both  sides  of  (3.8.2)  with  respect  to  F^,  i=0,  1,  2,  3,  4, 
we  obtain 

= ^®i  *^2^i  * (3.8.3) 

As  discussed  in  section  3.5,  when  (3.8.2)  holds,  then  necessarily 
BqC2  - B^(Cq  - Aq)  - AqB2  = 0 . (3.5.3) 

If  this  is  so,  then  the  first  four  equations  in  (3.8.3)  have  a unique 
consistent  solution  for  (u^^,  ^2}  . Let  this  solution  be 

“1  = (^2  - 

“2- tV2- W'Vl-  . 

Frcrni  the  assunqjtion  (3.8.1)  and  Hq,  must  also  satisfy  the  fifth 

equation  in  (3.8.3),  i.  e.  we  must  have 

N ' * f*l®2  - *2®l)'®2'=l  ■ V2’'^4  ■ 

or 

0 = “04^®lS  " ^2*"!^  * “01  ^^2^  ’ ^2®4^  * “02^S®4  ' ®1^4^ 

“34  ' ®2^1^  ' '*31  ^^2^  ’ ^2®4^  ' “31^^1®4  ’ ®1^4^ 

(3.8.4) 

The  right  hand  side  of  this  equation  is  a linear  combination  of  aQ^, 

Oqj^,  and  aQ2  (the  only  non-constant  terms).  (3.5.3)  can  be  rewritten  as 

" ■ “10=2  ■ “20*1  - “30f'^2  - 82'  * lt»l  - 82> 

In  the  above  argument  we  find  that  (3.8.4)  and  (3.8.5)  are  two  necessary 
conditions  for  both  assuinjtion  (3.8.1)  and  H„  to  hold.  Therefore  we  may 
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define  test  statistics 

^7  " ^0^2  ‘ ' ®2^  ‘ ®2^  (3.8.6) 

^8  “ ^^04  ■ “34^^®lS  ■ ®2^1^  ^^01  ■ “31^^®2^4  ' S®4^ 

" tR02  - «32^fS®4  - ^4^  ' 

where  R^j  is  defined  in  (3.2.1).  Obviously  expectations  of  and  Tg  are 

equal  to  the  right  hand  sides  of  equations  (3.8.5)  and  (3.8.4)  respectively, 
and  Var  T^,  Var  Tg  and  Cov(T^,  Tg)  can  be  calculated  by  using  (3.2.3)  and 
(3.3.2)  as  in  the  following 

n Var  T^  = - [C2a|^0  ' ®l“20  ' ‘ 

" |[C2Fi  - B2P2-  ^^2-  ^2^P31^‘^0  = hi  ’ 

n Var  Tg  = -[(Bj^C2  ■B2h^“04  'hh^“oi  '*’^h®4  ’hh^“02^ 

. |[(B^C2  - B2C^)F4  .CB2C4  -C2B^)F^  -BiCpF2]2dFo 

= t22  , say, 

n cav(T^,  Tg)  = ' [C2tx^o"h“2o  ®2h^h4 

. (C^B^  - BjC^)c,„,]  . |[C2Fi  - BjFj  -(C^  -B^jFjJ. 

[(BjCj  - B^cpp^  HB^C^  - C2B^)Fj  .(C^B^  - B^C^jF^ldF^ 

' tj2  , say  . 

^11 ’^22’  ^12  estimated  consistently  tenn  by  term  by  Proposition 

yS  /\  A 

3.1.  We  will  denote  these  consistent  estimators  by  tj^j^,  t22»  and  res- 
pectively. After  rearranging,  T^  and  Tg  can  each  be  expressed  as  means  of 
n i.i.d.  random  variables,  i.e. 

■^7-^  ^ tC2FjCX^)-BjF2CX^)-(C2-B2)F3(X^)HBi-B2)/2! 

k=l 


(3.8.8) 
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’'8  = A * (V4-®4S>»-fl«k)-“51> 

k=l 

+ (CjB^-B4Cj)(l-F2(Xj^)-a32)l  (3.8.9) 

Hence  if  nVar  < »>,  by  the  central  limit  theorem, 

T^CVar  T^)  ^ N(0,1)  in  distribution. 

And  if  nVar  Tg  < <», 

TgCVar  Tg)'^^^  -►  N(0,1)  in  distribution. 

Let  T = (Ty,Tg),  then  T has  an  asymptotic  bivariate  normal  distribution. 


Define 

Qn  = nT 


f 


I J 

then  has  an  asymptotic  chi-square  distribution  with  2 degrees  of  freedom 
(Wilks,  1962,  pp.  261).  ^plying  Theorems  2.1  and  2.2,  and 


Qn  = nT 


^11  ^12 


^12  ^22 


-1 


(3.8.10) 


have  the  same  asymptotic  chi-square  distribution  with  2 degrees  of  freedom. 
Under  H^,  ET^  = ETg  =0,  so  has  an  asymptotic  central  chi-square 
distribution.  Otherwise  ^ has  an  asymptotic  noncentral  chi-square  dist- 
ribution with  2 degrees  of  freedcmi  and  noncentrality  parameter 

(ET)'.  (3.8.11) 

Therefore  a large  sample  test  is 

reject  if  ^ is  too  large.  (3.8.12) 


n(ET) 


^11  ^12 
^2  ^22 


3.8.2  Reducing  Three  Ccmponents  to  Tvio 

Let  X^,  — ,X^  be  a random  sample  from  Fq  and  suppose  that  Fj^,  F2, 
Fj  are  known,  continuous.  Also  assume  that  Fq  is  continuous  and 
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Fq  = WjF^  + 1D2F2  + (1-u)j^-u)2)F2  properly. 
We  wish  to  test  the  following  null  hypothesis: 

Hq  : l-a)j^-(jj2  ® 0 


(3.8.13) 


Under  H^, 


Fq  = WjFj^  + (1-(Dj)F2  properly.  (3.8.14) 

Integrating  both  sides  of  (3.8.14)  with  respect  to  F^,  i=0,l,2,3,  we  have 


A.  = . 

When  Hq  is  true,  the  following  condition  must  hold  : 

“10  ^ “02  ■ “12  ■ 


(3.8.15) 


(3.1.5) 


from  which  it  follows  that  the  first  three  equation^  in  (3.8.15)  have  a 
unique  consistent  solution  of  Let  this  solution  . 

= A2  / B2  = (002-1/2)/ • 

Substituting  into  the  last  equation  in  (3.8.15),  we  have 


^“03'“23^ ^“12'^^^^  ~ ^“l3'“23^^“02’^^^^ 

Define  as  in  (3.3.1)  and 

^9  ^^03*“23^^“l2'^^^^  ’ ^“l3’“23^^^02’^'^^^  ’ 


(3.8.16) 


(3.8.17) 


where  is  defined  by  (3.2.1).  (3.1.5)  and(3.8.16)  are  two  necessary 

conditions  when  the  assumption  (3.8.13)  and  both  hold.  nVar  was 
derived  in  (3.3.3)  and  will  be  denoted  by  Sj^j^.  While 

nVar  Tg  = ’ t(»12’^^^^“30'^^“l3'“23^“20^  * t W3‘°‘23^^2^  *^0 

= S22  . say, 

nCov(T^,lg)  = ■ Caj^o'“20^  ^^“l2’^^^^“30'*’^“l3'“23^“20^ 
+/(Fj^-F2)[(ai2-l/2)F3+  Ca3^3"a23^^2^^0 

= s^2» 

Similarly  we  can  estimate  s^j^,  Sj^2>  ^22  consistently  by  Proposition  3.1, 
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(WTER  IV 

A SIMJLATION  STUDY 

In  this  chapter  we  study  a new  statistic  which  can  be  used  to  test 
hypotheses  of  proper  mixtures  when  the  conqxjnents  are  known,  e.g.  problems 
A3,  B1  and  D2.  A computational  algorithm  is  constructed  and  properties  of 
this  kind  of  statistics  are  discussed.  Then  we  use  simulation  procedure 
for  three  special  cases. 

4.1  Outline 

Let  be  a randon  sanple  from  and  si:q?pose  that  and  F2 

are  known,  continix>us  and 

We  wish  to  test  the  following  null  hypothesis 

Hq  : Fq=u)Fj^+(1-u))F2  properly.  (4.1.2) 

Define  a statistic 

D = inf  sup  |F-  (x)  - F (x) | 

0Su)Sl  X ^ 

= inf  sup  IF-  (x)  - wF^  (x)  - (l-a))F-(x)|  (4.1.3) 

Oswsl  X ^ ^ 

where  F^fx)  is  the  empirical  distribution  function  corresponding  to 
On 

Xj^,  — ,X^.  D can  be  rewritten  as 

D=  inf  max  K.  (w)  (4.1.4) 

Osusl  j=l,---,n  ^ 

where 

K.(u.)  =max  [1^^-  <^Fi(X(j))-  (l-a.)F2(X^j^)  | , 


A 


I 


) - (l-„)f2CX,„)l 


(4.1.5) 


and  is  the  j-th  order  statistic  of  X^, — .X^^. 

On  the  other  hand  the  distribution  of  D can  be  expressed  as  a weighted 
mean  of  conditional  distributions  (Behboodian,  1972)  as  follows: 

Pr{  D <d  I 0)  } 
n • 

= 2 (-)a)^(l-a))”"  Pr{D<d|  i out  of  n X's  are  from  F..  and  the 


rest  from  F2}, 


(4.1.6) 


where  w is  the  probability  that  X^^  comes  from  F^^.  We  will  use  (4.1.5)  to 
calculate  the  conditional  distributions  and  then  use  (4.1.6)  to  obtain  the 
(unconditional)  distribution  of  D. 

4.2  An  Algorithm 

In  this  section  we  describe  an  algorithm  which  enables  us  to  compute 
from  a given  random  san9>le  the  values  of  D and  the  latter  minimizing 
the  expression  (4.1.3).  From  (4.1.5)  (ui)  ,as  a function  of  u),  is  defined 
in  the  following  ways: 

(1)  When  s F2(X^jP,  we  have 

K.(u))  = F2f^(j))  ■ 
an  increasing  function  of  oi. 

(2)  When  F2(ll(j))  ‘ ^ 

" F,(Xy,)  - * »tFi(Xoj)-F2CX„,)l.  if 


(4.2.2) 


with  wj  = ■P2^^(j)^^f^l^^(j)^"^2^^(j)^^  ’ decreasing  first  in  the 

(4  • 2 • 3) 

interval  [0,aj?),  then  increasing  in  the  interval  [w*  1]. 

3 3 


I 
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(3)  When 


we  have 


■'jW  ■ 4-  - ''2%)’  * 

a decreasing  function  of  w. 

It  can  be  seen  frcan  the  above  argument  that,  for  each  j=l,  — , n, 

is  either  a straight  line  segment  or  a sequence  of  connected  straight  lines 

And  from  (4.1.4)  we  see  that  the  value  of  D is  attained  when  io  is  the 

minimax  point  among  all  the  intersection  points  of  straight  lines  (w) , 

3=1, — ,n.  Using  this  fact,  we  construct  a computational  algorithm  for 

finding  the  values  of  D and  v4\en  a random  sample  is  given.  (Programs 

written  in  Fortran  IV  are  available  from  the  author.) 

Step  1 : For  each  j=l,---,n  , let  a(3)=l,2  or  3 according  to  whether 

Kj((o)  is  of  case  1,2  or  3 respectively.  Let 

b(0)  = max  K.(0).  (4.2.5) 

3=1,— ,n  J 


Denote  by 

dK.(a)) 

^ 


a)=0  ' 

Then  find  such  that 

K'.  (0)  = max  { K:(0)  : K.(0)=b(0)  }. 

Iq  J J 

If  more  than  one  such  exist,  choose  any  one  of  them. 

Step  2 : If  a(3Q)=l,  put 

l>=b(0),  and  (3=0  (4.2.6) 

and  the  algorithm  stops.  Otherwise  let  (/i)o“0  and  proceed  to  step  3. 

Step  3 : Conpute  the  intersection  points  of  the  line  K . ((o)  with 

1 1 ^ 

all  the  other  lines  Kj((*))  and  denote  these  points  by  for 

Let 


(»)^  = min  max  (u)?^,io_  ) 

i‘>o  ^ 


i.zr) 


(w) 
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Step  4 : I£  put 

D=K.  (1),  and  <S=1 
^0 


(4.2.8) 


(4.2.9) 


and  stop  the  algorithm.  Otherwise  let  be  such  that 
Kj^(a)i)  = max  {Kj(a)^):  a)J=a)^}  . 

If  more  than  one  such  exist,  choose  any  one  of  them. 

Step  S : If  a(j,)=l,  or  both  a(ji)=2  and  to?  < u, , with  to?  defined  by 

t ^ 1 ^ 

(4.2.3),  put 


D=K.  (w^),  and  (2=10,  , 
3l  1 J- 


(4.2.10) 


and  stop  the  algorithm.  Otherwise  proceed  to  step  3 by  changing  indices 
appropriately. 


4.3  Numerical  Results 

For  exploratory  purpose  three  numerical  examples  are  studied  by  the 
simulation  procedure.  Namely,  vhen  the  component  distributions  are  res- 
pectively: 

(1)  ~ N(-1.5,l)  and  ~ N(1.5,l) 

(2)  F^  ~ N(-2,l)  and  F2  ~ N(2,l) 

(3)  F^  ~ E(1.5)  and  F2  ~ N(l,l/16)  . 

The  simulation  procedure  can  be  described  as  follows: 

(a).  For  each  i=0,l, — ,n,  where  n is  the  sample  size,  a set  of  random 
numbers  (X^,  — ,Xj^}  is  generated  according  to  the  distribution  Fj^,  and  a 
second  set  of  random  numbers  is  generated  according  to  the  dis- 

tribution F2.  They  are  then  combined  to  form  a sample  for  F^,  denoted  by 
s(i|n),  subject  to  the  condition  that  i out  of  the  n random  variables  are  dist- 
tributed  as  Fj^  and  the  rest  are  distributed  as  F^.  Applying  the  algorithm  in 


section  4.2  to  s(iln),  we  then  obtain  a pair  of  values  of  (/3,D). 

(b)  Repeat  (a)  until  the  desired  number  of  samples  is  achieved 

(200  in  our  cases)  and  the  pairs  of  values  of  ((3,D)  are  computed. 

Frcm  these  values  of  D,  we  compute  the  conditional  empirical  distributions 
of  D,  denoted  by  F^Cxji),  subject  to  the  condition  that  i out  of  n X's 
are  distributed  as  Fj^  and  the  rest  are  distributed  as  F2,  i=l,---,n. 

(c)  From  (4.1.6)  we  calculate  for  fixed  oj  the  unconditional  anpirical 
distribution  of  D. 

For  case  (1) , sanqiles  of  sizes  10,  20,  40  respectively  were 
generated.  The  resulting  empirical  distributions  are  summarized  in 
Table  4.1. 

For  cases  (2)  and  (3),  samples  of  sizes  10,  20  respectively  were 
generated.  The  resulting  empirical  distributions  are  summarized  in 

i 

Tables  4.2  and  4.3. 

i 

I 

I 

! 

I 

I 

) 


( 


63 


Table  4.1  Empirical  distribution  function  of  the  statistic  D 
calculated  by  (4.1.7)  when  the  ccmponents  are 
N(-l,5),  N(l,5)  respectively 


Sample  size  n = 10 


0) 

.0 

.05 

.10 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 

d 

.095 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.120 

.01 

.025 

.037 

.046 

.053 

.058 

.061 

.063 

.064 

.065 

.065 

.130 

.04 

.060 

.072 

.080 

.087 

.092 

.096 

.099 

.101 

.103 

.103 

.140 

.075 

.097 

.112 

.124 

.134 

.143 

.152 

.160 

.166 

.170 

.171 

.150 

.150 

.164 

.180 

.195 

.211 

.226 

.240 

.252 

.262 

.267 

.269 

.160 

.210 

.224 

.246 

.270 

.294 

.317 

.336 

.353 

.365 

.372 

.374 

.170 

.250 

.277 

.311 

.343 

.373 

.399 

.422 

.440 

.453 

.461 

.464 

.180 

.290 

.336 

.383 

.427 

.467 

.502 

.531 

.554 

.571 

.581 

.584 

.190 

.320 

.390 

.454 

.511 

.560 

.600 

.633 

.657 

.674 

.684 

.687 

.200 

.355 

.439 

.514 

.579 

.632 

.675 

.707 

.730 

.746 

.755 

.758 

.210 

.395 

.494 

.578 

.647 

.700 

.740 

.769 

.789 

.802 

.809 

.812 

.220 

.470 

.559 

.637 

.702 

.754 

.793 

.820 

.839 

.850 

.856 

.858 

.230 

.535 

.617 

.689 

.749 

.800 

.835 

.863 

.882 

.894 

.900 

.903 

.240 

.595 

.669 

.732 

.785 

.827 

.860 

.884 

.901 

.912 

.919 

.921 

.255 

.665 

.731 

.784 

.829 

.865 

.893 

.915 

.930 

.940 

.946 

.948 

.265 

.705 

.575 

.802 

.842 

.876 

.904 

.925 

.941 

.951 

.957 

.959 

.280 

.740 

.797 

.842 

.878 

.908 

.931 

.948 

.960 

.968 

.973 

.974 

.285 

.775 

.802 

.858 

.890 

.917 

.938 

.954 

.966 

.974 

.978 

.980 

.295 

.815 

.851 

.881 

.907 

.929 

.948 

.963 

.974 

.981 

.986 

.987 

.315 

.840 

.877 

.906 

.929 

.947 

.962 

.973 

.981 

.986 

.989 

.990 

.330 

.855 

.894 

.923 

.945 

.962 

.974 

.982 

.988 

.991 

.993 

.994 

.355 

.895 

.925 

.947 

.963 

.975 

.983 

.989 

.993 

.996 

.997 

.998 

.385 

.935 

.952 

.966 

.976 

.983 

.989 

.992 

.995 

.997 

.998 

.999 

.395 

.95 

.963 

.973 

.980 

.986 

.991 

.994 

.996 

.998 

.998 

.999 

.415 

.975 

.982 

.986 

.990 

.993 

.995 

.997 

.998 

.999 

.999 

.999 

.595 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 
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Table  4.1  (Cont.) 


Sample  size  n = 20 


U) 

.0 

.05 

.10 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 

d 

.059 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.094 

.015 

.047 

.067 

.082 

.092 

.098 

.102 

.105 

.108 

.110 

.111 

.108 

.10 

.134 

.179 

.214 

.238 

.251 

.261 

.269 

.276 

.282 

.284 

.122 

.220 

.288 

.355 

.402 

.433 

.453 

.468 

.481 

.492 

.500 

.503 

.129 

.275 

.361 

.437 

.488 

.522 

.546 

.566 

.582 

.594 

.602 

.605 

.143 

.425 

.504 

.587 

.650 

.694 

.724 

.743 

.755 

.763 

.767 

.769 

.164 

.580 

.661 

.737 

.792 

.831 

.859 

.878 

.890 

.897 

.901 

.903 

.185 

.725 

.779 

.838 

.880 

.907 

.924 

.936 

.944 

.951 

.954 

.956 

.199 

.785 

.833 

.879 

.914 

.937 

.953 

.963 

.970 

.975 

.978 

.979 

.227 

.880 

.913 

.938 

.957 

.971 

.981 

.988 

.993 

.995 

.997 

.997 

.262 

.955 

.966 

.977 

.986 

.992 

.995 

.998 

.999 

.999 

1.0 

1.0 

.339 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

Sample  size  n = 

(i)  .0  .05 

A 

40 

.10 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 

.040 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.060 

.010 

.016 

.023 

.025 

.029 

.034 

.036 

.035 

.033 

.034 

.034 

.080 

.165 

.191 

.236 

.264 

.284 

.305 

.327 

.344 

.352 

.353 

.352 

.100 

.420 

.518 

.564 

.597 

.631 

.662 

.682 

.692 

.698 

.704 

.707 

.120 

.635 

.748 

.796 

.828 

.857 

.882 

.899 

.910 

.918 

.923 

.925 

.130 

.735 

.827 

.876 

.902 

.918 

.931 

.940 

.948 

.954 

.959 

.961 

.140 

.805 

.870 

.913 

.940 

.954 

.963 

.969 

.975 

.980 

.983 

.984 

.150 

.860 

.903 

.942 

.965 

.977 

.983 

.987 

.989 

.990 

.992 

.992 

.170 

.945 

.954 

.973 

.987 

.933 

.995 

.997 

.998 

.999 

1.0 

1.0 

.190 

.965 

.972 

.983 

.992 

.996 

.998 

.999 

.999 

.999 

1.0 

1.0 

.210 

.980 

.991 

.995 

.997 

.999 

.999 

1.0 

1.0 

1.0 

1.0 

1.0 

.270 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 
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Table  4.2  Ein)irical  distribution  function  of  the  statistic  D 

calculated  by  (4.1.7)  when  the  components  are  N(-2,l), 
N(2,l)  respectively 


Sample  size  n = 10 


0) 

.0 

.05 

.10 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 

d 

.085 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.120 

.020 

.027 

.040 

.053 

,061 

.066 

.069 

.069 

.069 

.069 

.068 

.140 

.055 

.095 

.130 

.157 

.175 

.186 

.192 

.195 

.197 

.198 

.198 

.160 

.150 

.213 

.263 

.301 

.326 

.344 

.357 

. 365 

.371 

.374 

.375 

.180 

.270 

.347 

.409 

.455 

.490 

.515 

.533 

.546 

.555 

.560 

.562 

.200 

.415 

.499 

.566 

.618 

.656 

.683 

.702 

.716 

.724 

.729 

.730 

.220 

.515 

.604 

.671 

.720 

.758 

.787 

.809 

.825 

.837 

.844 

.846 

.240 

.605 

.682 

.745 

.794 

.833 

.863 

.886 

.903 

.914 

.921 

.924 

.260 

.705 

.770 

.822 

.862 

.894 

.917 

.935 

.948 

.957 

.962 

.964 

.280 

.785 

.831 

.869 

.900 

.925 

.944 

.958 

.968 

.975 

.979 

.980 

.300 

.840 

.875 

.905 

.929 

.948 

.962 

.972 

.980 

.985 

.987 

.988 

.380 

.955 

.968 

.977 

.983 

.988 

,992 

.995 

.996 

.998 

.999 

.999 

.490 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

Sample  size  n 

= 20 

U) 

.0 

.05 

.10 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 

d 

.058 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.094 

.050 

.076 

.082 

.084 

.087 

.090 

.094 

.098 

.102 

.105 

.106 

.112 

.150 

.228 

.258 

.268 

.271 

.275 

.282 

.292 

.303 

.311 

.314 

.130 

,280 

.397 

.456 

.487 

.504 

.515 

.528 

.543 

.558 

.569 

.573 

.148 

.445 

.564 

.629 

.669 

.696 

.715 

.713 

.746 

.758 

.766 

.769 

.166 

.620 

.719 

.771 

.806 

.828 

.844 

.856 

.867 

.877 

.884 

.886 

.184 

.740 

.819 

.860 

.886 

.905 

.919 

.930 

.938 

.943 

.947 

.948 

.202 

.815 

.877 

.909 

.930 

.947 

.959 

.968 

.974 

.978 

.980 

.981 

.220 

.875 

.925 

.949 

.964 

.975 

.984 

.989 

.993 

.994 

.995 

.995 

.238 

.905 

,963 

.975 

.983 

.990 

.994 

.997 

.999 

.999 

1.0 

1.0 

.256 

.930 

.978 

.986 

.992 

.994 

.997 

.998 

.999 

.999 

1.0 

1.0 

.418 

1.0 

1,0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 
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Table  4.3  Bnpirical  distribution  function  of  the  statistic  D 
calculated  by  (4.1.7)  when  the  components  are 
E(1.5),  N(l.l/16)  respectively 


Sample  size  n = 10 


(Jj 

.0 

.1 

.2 

.3 

.4 

.5 

.6 

.7 

.8 

.9 

1.0 

d 

.085 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.120 

.01 

.018 

.029 

.041 

.052 

.059 

.058 

.052 

.042 

.034 

.035 

.160 

.175 

.215 

.260 

.299 

.322 

.327 

.318 

.299 

.271 

.235 

.205 

.200 

.420 

.501 

.565 

.619 

.657 

.675 

.670 

.638 

.576 

.488 

.405 

.240 

.600 

.712 

.771 

.813 

.841 

.853 

.847 

.821 

.773 

.699 

.605 

.280 

.800 

.855 

.885 

.905 

.919 

.925 

.922 

.904 

.868 

.815 

.755 

.320 

.870 

.915 

.937 

.948 

.955 

.958 

.954 

.942 

.918 

.885 

.840 

.360 

.920 

.955 

.967 

.972 

.973 

.973 

.972 

.967 

.956 

.938 

.910 

.400 

.970 

.985 

.988 

.987 

.989 

.990 

.991 

.991 

.988 

.978 

.960 

.440 

.985 

.992 

.995 

.996 

.996 

.996 

.995 

.994 

.992 

.986 

.985 

.560 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

Sample  size  n = 20 


U) 

.0 

.1 

.2 

.3 

.4 

.5 

.6 

.7 

.8 

.9 

1.0 

d 

.058 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.103 

.075 

.122 

.149 

.168 

.178 

.178 

.172 

.157 

.142 

.136 

.085 

.121 

.180 

.283 

.342 

.362 

.372 

.384 

.393 

.382 

.355 

.324 

.210 

.139 

.325 

.453 

.517 

.545 

.571 

.593 

.598 

.580 

.555 

.530 

.430 

.157 

.500 

.615 

.674 

.706 

.737 

.758 

.759 

.743 

.718 

.672 

.545 

.175 

.640 

.727 

.788 

.820 

.841 

.850 

.845 

.833 

.823 

.787 

.645 

.202 

.810 

.854 

.883 

.905 

.922 

.936 

.940 

.931 

.911 

.873 

.760 

.238 

.910 

.946 

.958 

.966 

.974 

.978 

.977 

.973 

.968 

.957 

.890 

.265 

.965 

.973 

.985 

.990 

.991 

.990 

.986 

.982 

.979 

.965 

.935 

.301 

.975 

.992 

.996 

.997 

.998 

.998 

.995 

.992 

.992 

.986 

.970 

.418 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 
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Tables  4.1  — 4.3  show  that: 

(1)  In  general,  as  the  saii5)le  size  n increases,  the  entire  range  of 
the  enpirical  distribution  function  moves  toward  zero.  This  is  as  expected, 
since  when  the  hypothesis  of  mixture  is  true,  the  value  of  D would  be  close 
to  zero. 

(2)  For  the  two  normal  cases,  value  in  Table  4.1  is  less  than  the 
corresponding  value  in  Table  4.2  for  fixed  n,  d and  w.  This  indicates  that 
the  statistic  D perfonns  better  when  the  two  conponents  are  further 

apart . 

(3)  In  all  three  cases, as  the  actual  mixing  proportion  w increases 
from  zero  to  one,  the  mean  of  e.d.f.'s  first  decreases,  then  increases 
The  same  phenomenon  holds  for  almost  all  the  percentiles  of  the  e.d.f. 

In  other  words,  the  statistic  D when  used  to  test  the  hypothesis  of  mixture, 
will  perform  better  when  the  actual  mixing  proportion  is  near  .5. 

(4)  In  all  three  cases,  vrtien  .1  < oj < .9  , the  e.d.f.'s  reamin 
relatively  constant  for  varying  oj.  This  is  especially  true  when  sanple 
size  n becomes  larger. 

The  average  values  of  Q over  each  200  samples  are  summarized  in 
Table  4.4.  It  appears  from  Table  4.4  that  unless  the  actual  proportion 
= i/n  is  close  to  either  0 or  1,  the  averaged  values  of  are  very  close 
to  (i).  This  is  an  indication  that  the  above  procedures  can  also  be  used  to 
estimate  the  true  proportion  parameter  of  the  mixture. 

Using  nominal  significance  levels  a = .025,  .05,  we  also  calculate, 
from  the  above  values  of  D,  approximate  values  a of  the  actual  significance 
levels  in  Tables  4.5  - 4.7.  More  explicitly,  the  a's  are  calculated  by 
the  following  formula : 
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Table  4.4  Average  values  of  Q)  over  200  generated  samples 

(In  cases  (1)  and  (2),  vrt\enu)>.S,  use  1-the  value 
corresponding  to  l-w.) 


The  actual 
proportion 
from  u) 

n = 10 

Case  (1) 

20 

40 

.0 

.0555 

.0560 

.0382 

.025 

.0511 

.050 

.0701 

.0612 

.075 

.0743 

.100 

.1072 

.1092 

.0995 

.125 

.1286 

.150 

.1575 

.1415 

.175 

.1710 

.200 

.1964 

.1945 

.1964 

.225 

.2173 

.250 

.2384 

.2378 

.275 

.2622 

.300 

.2908 

.2941 

.3064 

.325 

.3277 

.350 

.3498 

.3454 

.375 

.3692 

.400 

.3909 

.4011 

.3975 

.425 

.4280 

.450 

.4510 

.4461 

.500 

.5022 

.4890 

.4977 

.550 

.600 

.650 

.700 

.750 

.800 

.850 

.900 

.950 

1.000 


Case  (2) 

10  20 

Case 

10 

(3) 

20 

0635 

.0411 

.1051 

.0880 

.0732 

.0987 

1182 

.1052 

.1738 

.1445 

.1430 

.1763 

1926 

.1969 

.2210 

.2012 

.2406 

.2576 

2948 

.3059 

.2944 

.2967 

.3552 

.3443 

4085 

.4080 

.3809 

.3739 

.4487 

.4352 

,4906 

.5006 

.4739 

.4525 

.5546 

.5930 

.6116 

.6403 

.6703 

.6472 

.7463 

.7308 

.7787 

.7909 

.7768 

.8429 

.8577 

.8294 

.8823 

f 
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“ “ .I^(  ? ^ points  (0  , D)  above  the  curve 

I i out  of  n X's  are  distributed  as  Fj^)  , (4.3.1) 

where  the  curve  is  formed  by  connecting  all  those  points  (i  ^ 
i=l,  — , n,  with  ^ satisfying 

1 -a  = Pr{  D < D-  . 1 J } (4.3.2) 

l-a,i  ‘ n ^ 

Tcibles  4. 5-4. 7 show  that  : 

(1)  As  sanple  sizes  n increase,  in  general,  the  approximate  values 
of  the  actual  significance  appear  to  approach  the  nominal  significance 
level  (a).  This  fact  is  as  expected,  since  as  n increases,  the  performance 
of  the  test  statistic  usually  in5)roves. 

(2)  When  w is  not  too  close  to  either  0 or  1 , the  values  of  a ranain 
relatively  constant.  This  is  due  to  the  fact  that  viien  w is  close  to  either 
0 or  1,  the  small  number  of  random  variates  generated  for  one  component 
causes  a great  deal  of  variability  among  the  200  combined  samples,  which 

in  turn  causes  the  values  of  D to  become  larger.  Hence  at  both  ends  of 
[0,1]  we  tend  to  have  larger  values  of  a . 

(3)  In  the  cases  of  two  normal  components,  corresponding  values 

in  Tables  4.5  and  4.6  are  fairly  close  to  each  other.  This  indicates 
that  if  we  use  the  statistic  D to  test  (4.1.2)  and  reject  if  D is  too 
large,  then  this  test  would  be  consistent  with  respect  to  the  distance 
between  the  components. 

(4)  The  values  of  5 are  closer  to  the  nominal  significance  levels  in 
both  normal  cases  than  in  the  exponential -normal  case. 


Table  4.5  Approximate  valiies  a of  the  actual  significance  levels  of  the 
statistic  D when  the  components  are  N(-1.5,l),  N(1.5,l)  res- 
pectively (When  a)>.5,  use  the  values  corresponding  to  l-w.) 


a = .05 


a = .025 


20 

40 

n = 10 

20 

40 

.0800 

.0775 

.0325 

.0425 

.0550 

.0685 

.0695 

.0287 

.0309 

.0389 

.0554 

.0526 

.0236 

.0262 

.0298 

.0462 

.0446 

.0205 

.0226 

.0238 

.0406 

.0433 

.0166 

.0192 

.0227 

.0375 

.0448 

.0132 

.0166 

.0237 

.0360 

.0475 

.0104 

.0151 

.0246 

.0353 

.0501 

.0084 

.0142 

.0247 

.0347 

.0512 

.0070 

.0126 

.0233 

.0342 

.0510 

.0062 

.0131 

.0213 

.0340 

.0507 

.0060 

.0130 

.0202 

Table  4.6  Approximate  values  a of  the  actual  significance  levels  of  the 
statistic  D when  the  components  are  N(-2,l),  N(2,l)  respec- 
tively (When  a)>.5,  use  the  values  corresponding  to  1-w.) 


20 

n = 10 

20 

3825 

.0500 

.0550 

.0350 

3711 

.0340 

.0420 

.0170 

3599 

.0291 

.0321 

.0141 

3497 

.0287 

.0247 

.0144 

3409 

.0296 

.0192 

.0152 

3338 

.0309 

.0153 

.0162 

3285 

.0322 

.0127 

.0173 

3247 

.0331 

.0112 

.0182 

3222 

.0335 

.0105 

.0186 

3209 

.0336 

.0101 

.0188 

3204 

.0336 

.0101 

.0188 

Table  4.7  Approximate  values  a of  the  actual  significance 
levels  of  the  statistic  D when  the  components 
are  E(1.5),  N(l,l/16)  respectively 


a = .05  a = .025 


n = 10 

20 

n = 10 

20 

03 

0 

.0925 

.0350 

.05 

.0667 

.0532 

.0255 

.0267 

.1 

.0508 

.0508 

.0202 

.0204 

.15 

.0403 

.0467 

.0171 

.0160 

.2 

.0331 

.0416 

.0153 

.0131 

.25 

.0279 

.0364 

.0139 

.0116 

.3 

.0242 

.0318 

.0129 

.0108 

.35 

.0215 

.0277 

.0120 

.0101 

.4 

.0197 

.0243 

.0111 

.0095 

.45 

.0188 

.0222 

.0103 

.0093 

.5 

.0186 

.0217 

.0094 

.0099 

.55 

.0192 

.0229 

.0090 

.0113 

.6 

.0206 

.0253 

.0088 

.0134 

.65 

.0227 

.0282 

.0089 

.0154 

.7 

.0258 

.0308 

.0096 

.0168 

.75 

.0297 

.0327 

.0111 

.0175 

.8 

.0345 

.0345 

.0136 

.0183 

.85 

.0401 

.0384 

.0176 

.0211 

.9 

.0463 

.0476 

.0235 

.0277 

.95 

.0529 

.0656 

.0316 

.0376 

1.0 

.0600 

.095 

.0425 

.0500 

CHAPTER  V 


MISCELLANEOUS  PROBLEMS 

In  this  chapter  we  study  various  problems  which  have  not  been 
discussed  in  the  previous  chapters. 


5.1  Problem  B2  — Normal  Ccm^x>nents 

Let  be  a random  san^jle  from  and  suppose  that  and  F2 

are  of  known  form,  symmetric,  and  F2(x)=Fj^(x-t) , t>0  known.  Assume  also 
that 

Fq  = ujFj^  + Cl'a))F2  properly  . (5.1.1) 


We  wish  to  test  the  following  null  hypothesis: 


Hp  : 0)  = 0 or  1. 


(5.1.1) 


Assume  further  that  F^  is  distributed  as  N(m^,a  ),  i=l,2.  (note  that 
m2-mj=t>0.)  Behboodian  (1972)  shows  that  under  (5.1.1),  the  sample  mean 

X = E X.  has  density 
n n 1 


f(x,u)  = 


W-  ^ [x-m2+  ^ t]^}.  (5.1.3) 


It  follows  that,  for  0<(i)<l 


f 

T 


is  a decreasing  function  of  x,  and 


m 


is  an  increasing 


function  of  x.  (5.1.4) 

Therefore  a test  can  be  formulated  as  follows: 


Reject  Hp  if  b < X^^  < a 


(5.1.5) 
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where  a,  b are  chosen  such  that 
a a-m,, 


£(x,0)dx  =$  ( 

-oo  0/ 

t)  b-m, 

f(x,l)dx  =$  (- 


-)  = l-a/2  , 


-)  - a/2 


(5.1.6) 

(5.1.7) 


0/ 

and  a is  the  level  of  significance. 

The  above  method  can  be  applied  to  the  case  when  there  exists  a sta- 
tistic with  density  satisfying  (5.1.4). 


5.2  Problem  B5 

Let  — ,X^  be  a randan  san^ile  from  Fg  and  st^spose  that  and  F2 
are  of  known  forms,  and  have  densities  fj^  and  ^2  respectively.  Also  assume 
that 

Fq  = ojFi  + (1-w)F2  properly  , (5.2.1) 

then  Fq  has  density  fQ=ajfj^+(l-aj)f2*  We  wish  to  test  the  following  null 
hypothesis: 

Hq  = ^ <^0 

against  the  alternative  hypothesis 
: u)  > (1)2  » 

where  OaoQ<u)2^  i • First  let  us  state  a lensna. 


Lemma  1 

Let  X, , — ,X  be  a random  san^^le  from  F».  Assume  that  F„  has  density 


fQ(x,a)),  where  w is  a real -valued  parameter,  and  that,  for  a)<a)'. 


fQ(X,(jo) 


is  a nondecreasing  function  of  x. 


(5.2.2) 


Then  the  test 


1 if  ^X^ » ” ” " » X|^  f (i)Q » ^ c 

r if  L^  (Xj  * ■ ■ ■ » X^ , u)q  , bij^ ) “ c 
0 if  (X j » f Xj^  • ojq  t ujj ) < c 
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(5.2.3) 


where 


Lj^CXj,  — ,X^;u,,a)')  = n 


n fpCXj^.u)') 


(5.2.4) 


i=l  fQ(X^,aj) 


-V-“> 


and  c and  r (0<r<l)  are  constants  determined  by  E i^(X^,- 

0 

maximizes  the  minimum  power  for  testing  against  Hj^. 
Proof;  See  Lehnann  (1959),  pp.  330. 


Lemma  1 can  be  extended  by  slightly  altering  the  assumption  (5.2.2)  as 
follows: 


Lenina  2 


If  in  Lanna  1,  instead  of  (5.5.2),  assume  that  for  uj  < w' 


fQ(x,(x)') 

fQ(x,a)) 

t(x). 

Then  the  test 


is  a nondecreasing  function  of  some  suitably  chosen  function 

(5.2.5) 


'1  if  Lj^(t(X^),-,t(X^);u)o,a)^)  > c 

ij;(t(X2^),  >t(X^))  “ p iy^CfCXj^),  » t ()^)  ,(i)q ,(1)2^)  “ C (5.2.6) 

0 if  L^(t(X2),  ft(Xj^)  jcoQ,(i)2^  c 


vdiere 


Lj^(t(Xi),---,t(X^);a),u')  = n 


n fQ(X2,u)') 


i=l  fQ(X2,a)) 


and  c and  r(0<r<l)  are  constants  determined  by  i|^(t(X2),  — »f(Xj^))=ai 


maximizes  the  minimum  power  for  testing  Hq  against  H^. 


Proof:  A slight  modification  of  Lehmann's  proof  for  Lemma  1. 
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( ‘ 

li 
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In  the  case  of  a 2-ccmponent  mixture,  the  density  of  is  of  the  form 
fo=a)fi+(l-(i))f2.  In  order  to  apply  Laranas  1 or  2,  fQ(x,a))  must  satisfy  the 
assumptions  (5.2.2)  or  (5.2.5).  Note  that  we  may  write 

. 1*  - » 
fo(x,a))  f2(x) 

+ UJ 

fj^(x)-f2(x) 

. (5.2.7) 

-1 

+ U) 

For  u<a)' , this  is  a nondecreasing  function  of  fj^(x)/f2(x).  Hence  if  we 
let 

t(x)  = f^(x)/f2(x)  , (5.2.8) 

we  may  apply  Lemma  2 to  obtain  a test  (i.e.  (5.2.6))  which  maximizes  the 
minimum  power  for  testing  against  If  further  fj^(x)/f2(x)  is  a non- 
decreasing function  of  another  suitably  chosen  function  t^(x),  then 
fQ(x,u)')/fQ(x,a))  is  also  a nondecreasing  function  of  t^(x),  Lemma  2 can  be 
applied. 


ff  1 (X) 

— 1 

£2(x) 


5.3  Problgns  B4  and  B5 

Let  — ,X^  be  a random  sample  frcrni  F^  and  suppose  that  F^  is  known, 
F2(x)=Fj^(x-t)  (in  problem  B4)  or  F2(x)=F^(x/t)  (in  problem  B5),  with  t 
unknown.  We  wish  to  test  the  following  null  hypothesis: 

^0  ■ ^0  ~ ‘^1  * (£'w)F2 

Owing  to  the  presence  of  the  nuisance  parameter  t,  we  can  not  apply  the 
methods  discussed  in  sections  2.1,  3.1,  3.2  or  3.3.  Behboodian  (1976)  uses 
a method  of  moments  to  estimate  the  (unknown)  parameters  o)  and  t.  We  could 
of  course  use  this  estimated  value  of  t,  and  apply  the  methods  discussed 


k 
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f 
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in  the  previous  sections  as  if  both  components  were  ]<nown  to  have  these 
estimated  values  to  derive  test  statistics.  In  the  following  we  will  explore 
another  approach  to  problem  B4. 

Denote  by  the  empirical  distribution  function  corresponding  to 

the  random  sample  Without  loss  of  generality,  in  the  following 

we  will  assume  t>0,  (if  t<0  the  following  still  holds  by  exchanging  F^Cx) 
and  F^Cx)).  Then  ^ , and 

Fg(x)  = F^(x)  + (l-a))[F2(x)-F^(x)]  s F^Cx).  (5.3.1) 

It  follows  that  for  i=0,l, — ,n, 

I^OnW-FoW  I ^ max  (i  - F^(X(.3).  ^)  . 

^(i)-''-^(i+l) 

vdiere  X^q^=-«>  , Therefore 

sup  |F  (x)-F  (x)(  > max  [max(^ -F^(X,.^) , F^(X^.^^p-  i)] 

-oo<x<oo  i=l,  — ,n 

and  then 

D = inf  S15)  |F»  (x)'F„(x)  I > max  [max(i  - Fj^(X,^0, 

Oso<l  -«><x<“  i=l,  — ,n 

iD  (5.3.2) 

n 

where  is  the  Kolmogorov-Smimov  statistic,  the  distribution  of  which 
can  be  used  along  with  (5.3.2)  to  obtain  a test  for  H^. 

5.4  Problems  Cl,  C2,  C3 

Let  Xqj, — ,Xq^  be  a random  sample  from  Fq  and  suppose  that  for 

i«l,2,  is  continuous  and  is  either  known  or  there  is  a random  sample 

•nr  It.  Kle  wish  to  test  the  following  null  hypothesis: 

- (if  j ♦ (1-u))F2  for  some  oj  in  [a,b]  with  0<a<b<l,  known  . 

(5.4.1) 


•< 
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We  show  in  the  following  that  we  can  use  the  test  statistics, 

Tj,  discussed  in  sections  3.1  - 3.3  for  problems  C2, 
respectively . 

Define 

Ff*  = bF^  + (1-  b)F2  , 

F2*  = aFj^  + (1-  a)F2  (5.4.2) 

Substituting  these  expressions  into  (5.4.1)  and  simplifying  we  may 
express  F^  as 


r.  (1)  - a 


^*^b-a)T'* 

•^1 


Let 


0)  = 


0)-  a 
b-  a 


* * 


(5.4.3) 
* 


then  Fq  = to  Fj^  +(1  - w )F2  , with  0 < to*  < 1.  Since  both  F^  and 

F2  are  distribution  functions,  can  be  reformulated  as 


Hq  = ^0=  "l  ^ - ^^2 


properly 
^0 


(5.4.4) 


In  other  words,  Hq  is  true  if  and  only  if  is  true. 
Define 


“iO  1 ^i  ^^0  ’ 1=1 » 2 

(5.4.5) 

* f * * 

“12  = Pi  "^2 

(5.4.6) 

From  the  assumption  of  continuity  of  F^,  F2  and  the  definitions,  Fj^* 

■k 

and  F2  are  continuous,  hence  we  have 


“Oi  “iO  ’ 


* * 
“21  " ^ ■ “12 


Following  the  same  argument  as  in  section  3.1,  under  we  have  a 


necessary  condition 

“10  “02  ■ “12  ■ I " ° ' 


(5.4.7) 


By  definition,  = bo^p  ^1-  b)a2o.  “20  = ^^^20,  and 


m 
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= ^(1+  a -b)  + (b-  a)a^2  > '^ere  is  defined  by  (3.1.3), 

(5.4.8) 


12  ''  ' iz  ' 13 

hence  we  may  rewrite  (5.4.7)  as 


“lO*'^  “02*  • “12*-  I “02"  “12' 


Define 
* 


T2  (b-  a)T2  (b-  a)(Wj^Q+  Wq2"  ^12  ~ 2^’ 

T3  = (b-  a)!^  = (b-  a)(Wj^Q  + Rq2  -1^2  ' » 


(5.4.9) 


¥ 


T4  (b  -a)T^  - (b  -a)(R^Q  +Rq2-  0.^2 

vdiere  is  defined  by  (3.1.2)  and  R^^  by  (3.3.1).  The  variances  of 

T2  , Tj  , can  be  derived  by  using  the  variances  of  T2,  T^, 

respectively.  Similarly,  consistent  estimators  of  these  variances  can 

2 

be  derived  from  those  of  T2  , and  by  multiplying  the  factor  (b-a)  . 
Tlien  large  sample  tests  can  formulated  as  follows: 

(1)  Problem  Cl 

Reject  Hq  if  |T2*|(Var  T2*)‘^^^  > z, 

(2)  Problem  C2 

Reject  Hq  if  | (Var  ) ' > z, 

(3)  Problem  C3 

Reject  Hq  if  |T/|(Var  t/)'^/^  > 
where  Zj^_  0-'  a/2)xl00  percentile  of  the  standard  normal 

distribution. 

5.5  Problems  Dl,  D2,  D3 

Suppose  that  each  of  F^,  Fj^  is  continuous  and  is  either  known,  or 
there  is  a random  sample  from  it,  and  assume  that 
Fq  = wF^  + (1  - properly.  (5.5.1) 

We  wish  to  test  the  null  hypothesis: 


a-  a/2’ 


'l-a/2 


(5.4.10) 


(5.4.11) 


(5.4.12) 
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Hq:  F2=  F,  a specified  d.  £. 

Before  discussing  how  to  test  H^,  let  us  first  analyze  the  testing 
problem  a little  further.  Givai  Fq  and  F^,  does  there  exist  a real 
number  w in  [0,1] such  that  F^-  wFj^  is  a nondecreasing  function?  The 
answer  to  this  question  is  contained  in  the  following: 

Proposition  5.1 


Let  Fq,  Fj^  be  two  cumulative  distribution  functions  satisfying  the 
condition  that  whenever  Pq(x')  = FqC^c)  for  x < x' , it  implies  F^^  (x' )=Fj^(x] . 
Define 

G(x,us)  = C5.5.2) 

then  G(x,  u)  is  nondecreasing  if  and  only  if 

(A)  < (i)q  , (5.5.3) 


where 


Wq  = inf<- 


,P„(x')  - F.(x) 


: x' > X , Fj^Cx' ) > Fj^(x)|  (5.5.4) 


'‘Fj(x*)  - F^(x) 

If  furthermore,  F^,  F^^  possess  densities  f^,  f^  respectively,  such  that 
whenever  fQ(x)  =0  it  iii5)lies  that  fj^(x)=  0,  then  G(x,  oi)  is  non- 
decreasing if  and  only  if 


^ = ^iW>  0 I _ 

Remark:  If  Fq  is  a proper  mixture  of  F^  and  F2,  then  the  mixing 
proportion  w must  satisfy 

0 s u)  s Wq  . 

On  the  other  hand,  by  a similar  argument,  Proposition  5.1  can  be 
applied  to 

H (x,ai)=  afQ(x)  - ( 1 -a,  )F2(x) 
and  then  ^ must  also  satisfy 


! 
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(1)  2:  1 - , 

v^ere 

r F (X')  - F (X)  ^ 

ax,  = inf^ : F-(x')  > F-(x),  x'>xy  (5.5.5) 

^ t F2(x’)  - F2(x)  ^ ^ I 

In  order  that  oj  to  be  unique,  we  then  must  have 

a»Q  = 1 - , (5.5.7) 

vdiich  can  be  used  to  derive  test  statistics  for  testing  the  hypothesis 
of  mixture.  (Not  shown  in  this  paper). 

Proof: 

Suppose  that  G(x,  ox)  is  nondecreasing,  then  for  x < x'  such  that 
Fj^(x')  > Fj^(x),  we  have 

Fq(x-)  - Fg(X) 

0)  < 

F^(x’)  - F^(x) 

It  follows  that,  by  taking  infimum,  ox  ^ oxq  . Conversely  if 
0)  < ojq  , then  for  x'  > x and  F^(x')>F^(x),  we  have 

Fq(x')  - Fq(x) 

0)  s ojf,  s , or 

F^(x')  - Fj^(x) 

G(x',  oj)  > G(x,  0))  . 

For  x'  > X and  Fj^(x')  = Fj^(x),  we  have 

G(X',  0))  - G(x,a))  = Fq(x')  - Fq(x)  ^ 0 . 

If,  furthermore,  Fq,  Fj^  have  densities  f^,  fj^,  respectively,  then 
G(x,a))  is  nondecreasing  if  and  only  if 
fQ(x)  • a)fj^(x)  > 0 for  all  x . 


The  rest  of  the  proposition  follows. 


q.  e.  d. 


I 
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From  (5.5.1),  can  be  expressed  as 

F2  = ( 1-  0))'^  (Fq  - (oFp  (5.5.8) 

Conversely  if  there  exists  an  oi  in  (0,1)  such  that  Fq  - ojFj^  is 
nondecreasing,  ( 1 - to)  ^(F^  - toF^^)  would  be  a distribution  function 

and  Fq  would  be  a mixture  of  Fj^  and  (1  -(o)'^(Fq  - loFj^)  . From 

Proposition  5.1  there  might  be  infinitely  many  such  , and  the  null 
hypothesis  Hq  ; F2  = F states  that  there  is  one  to  which  will  make 
( l-(o)  ^(Fq  ■ equal  to  F (a.  s.  ) 

Now  under  H^,  we  have 
Fq  = F^  + ( 1 - a))F  , 

and  so  we  can  use  the  statistics  and  T^,  discussed  in  sections 
3.2,  3.3,  to  test  H^. 

Another  possible  line  to  attack  would  be  to  use  the  following 
statistics  as  in  Chapter  IV: 

inf  sup  1 F„  (x)  - (oF.  (x)  - (1  - (o)F(x)  |,  (for  problem  Dl) 

Oso<l  ^ ^0  ^1 

inf  sup  |Ff.  (x)  - loF- (x)  - ( 1 - (o)F(x)|  , ( for  problem  D2) 

05JJ<1  X ^ 

inf  sup  lF„(x)  - a)F-  (x)  - (l-m  )F(x)|  , ( for  problem  D3) 

Osmsl  ^ ^ ^ 

vdiere  F^  (x)  is  the  empirical  distribution  function. 

JUi 


! 


5.6  Problem  FI 

In  this  section  and  the  next  two  secticms,  we  study  problons  of 

• • 3 

testing  two  mixtures  simultaneously.  Let  OCq^  } and  } be 
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randan  sain)les  from  and  "bO  with  sairple  sizes  n^^,  n^^Q  respectively. 
Assume  that  all  the  X's  are  mutually  independent.  Suppose  that  each 
of  F^2»  continuous  and  is  either  known,  or  there  is 

a random  sample  from  it.  We  wish  to  test  the  follwoing  null  hypothesis: 

properly  and 


"o’  "aO  = Val  * < ■ - “a>"a2 


"bO  ■ “a^bl  * ■ “05^2  P">I»rly. 


(5.6.1) 

(5.6.2) 


For  each  single  mixture  we  may  use  the  test  statistics  T2,  or 
discussed  in  sections  3.1  - 3.3. 

Since  all  the  sanqjles  under  consideration  are  assumed  to  be 
independent,  we  can  conbine  the  test  statistics  for  each  single  mixture 
to  form  a test  statistic  for  . In  the  following  we  use  an  example 
to  illustrate  how  the  statistic  can  be  derived. 

Sjppose  that  F^^  is  known  and  for  each  of  F^2>  ^^^,2  t^bere  is 

a randan  sanple  ^2'  "bl’  %2  ’^sspectively. 

Also  assume  that  all  the  samples  are  mutually  independent.  For  testing 
(5.6.1)  we  use 

I o <a  1 

(5.6.3) 


t/  = R./  - Ri2^  - J 


10 


02 


Si  Q 

where  R^j  is  similarly  defined  as  in  (3.2.1)  and  as  in  (3.1.2) 

For  testing  (5.6.2)  we  use 


b b 

‘2 


Wiyi  Wn-J  " W 


02  *''12  2 


(5.6.4) 

b 


Var  Tj  , Var  T2  and  their  consistent  estimators  Var  , Var  T2 
can  be  similarly  derived  as  in  sections  3.3,  3.2. 

There  are  many  possible  methods  of  combining  these  two  independent 

> < 3 b 

statistics,  Tj  , T2  , to  test  Hq.  We  will  consider  three  of  them: 
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Cl)  as  all  n’s  tend  to  <»  , 

Tio  = (T3®)Var  + (T2^Var  T2^‘^  — > X2  C5.6.5) 

in  distribution. 

Si  b 

(2)  Given  observations  ’ ^2  ~ ^b  ' ^®fiae  the  following 

random  variables: 

- 2 log  Pr{  > t^  } 

- 2 log  Pr{  T2^  > 

then  , 

= - 2 log  Pr{  > t^}  - 21ogPr{  T2'’  > t^}  (5.6.6) 

will  have  approximately  a chi-square  distribution  with  four  degree  of 

rt  b 

freedan.  ( If  and  T2  are  continuous,  this  would  be  so  exactly.) 

(3)  This  will  be  discussed  in  section  5.7. 

5.7  Problem  F2 

3-  b 

Let  {Xq^  } and  } be  random  sanqjles  form  F^q  and  with  sizes 

^aO’  %0  Assume  that  these  two  samples  are  independent. 

Suppose  that  each  of  F^j^,  F^2>  ^^,2  continuous  and  is  either  known 
or  there  is  a random  sample  frcm  it.  We  wish  to  test  the  following  null 
hypothesis: 

“o-*  ^aO  = “"a^al  * ' ‘"a^^a2  properly  (5.7.1) 

^bO  " “^^1  * ^ ' “b^^b2  properly  (5.7.2) 

The  difference  between  (5.7.1)  - (5.7.2)  and  (5.6.1)-(5.6.2)  is  that  F^j^ 
and  F^j^  are  not  necessarily  identical  in  ( 5. 6.1) -(5. 6. 2),  but  are 
identical  in  (5.7.1)- (5.7.2).  If  F^j^  is  known,  the  method  discussed  in 
section  5.7  can  be  applied  here,  but  not  if  F^j^  is  only  given  by  a 
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random  sample.  For  in  this  latter  case  the  test  statistics  for  each 
single  mixture  are  not  necessarily  ( or  usually)  independent.  Hence 
we  have  to  investigate  their  covariance  structure.  On  the  other  hand, 
the  method  given  below  can  be  applied  to  section  5.6  . (This  is  why  it  is 
included  as  method  (3)  at  the  end  of  that  section.) 

First  suppose  that  there  are  randran  samples  X2^‘ 

of  sizes  n^j^,  n^2>  %2  ^al’  ^a2’  ^b2 

assume  that  all  sanples  are  mutually  independent.  For  testing  (5.7.1) 
we  use 


^10^  '^12  ~2 


(5.7.3) 


(5.7.4) 


\diere  is  similarly  defined  as  in  (3.1.2).  For  testing  (5.7.2)  we  use 

T^=W  ^+W  ^-W 

'■2  10  02  12  2 

Then  Var  T^  and  Var  T2  can  be  derived  by  using  (3.1.4),  while 
^al  ’ ^2  ^ '^“01  ■ “21  ^'"-“Ol  ‘ “21  ^ 


^^0  ■ ^a2^t^b0  ■ ^b2^^^al 


Define  matrix 


^1- 


Var  T. 


Cov(T2®,T2^) 


Cov(T2^,  T2*’)  Var  T2^ 


(5.7.5) 


(5.7.6) 


Let  N = + n^^  + n^2  ^ "bO  ^ %2 


, and  assume  that  as  N — 00  , 


n^/N  — r^  and  that  there  exists  e such  that  0 < e < r^j  < 1-e  < 1 

for  all  n's.  Then  under  Hq  , n1/2  (T2^,  T2^)  has  an  asymptotic  bivariate 

normal  distribution  with  mean  vector  (0,  0)  and  variance -covariance  matrix 

lim  NE  . 

N-ko 
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Denote  the  consistent  estimators  of  Var  ^2^^’ 

/V  f\  ^ ^ o 

vtfiich  can  be  derived  as  in  section  3.1,  by  Var  '^2*  ^2  ’ ^^^^2  *^2  ^ 


respectively,  and  introduce  the  matrix 

a 


S = 
1 


Var  To 


&vCt/.  t/)  1 


Cov(T2^,  12*^)  Var  12*^ 


(5,7.7) 


(5.7.8) 


Then  by  Theorans  2.1  and  2.2,  the  statistic 
Ti2  - N(t/, 

has  an  asymptotic  central  chi-square  distribution  with  2 degrees  of 
freedom  under  Hq  . Generally  it  has  a non-central  chi-square  distribution 
with  2 degrees  of  freedom,  and  noncentrality  parameter 
a nm  bs  _ —1 a 


lim  N(Er  ET,^)  Z '^(ET/,  ET-^)’  , 

N-xx.  z i z z 


(5.7.9) 

where  ET  denotes  the  expectation  of  T. 

Suppose  that  F^2  known  and  there  are  (mutually  independent)  random 
samples  from  each  of  F's.  Then,  instead  of  T2  , we  use 


‘3  "10 

to  test  (5.7.1)  and  use 


T,^  = - R^2^  - I 


02 


Tjj  = N(T3^  T2^S2■V3^  T^”)' 

to  test  Hq,  where  S2  is  similarly  defined  as  Sj^,  i.  e. 


(5.7.10) 


(5.7.11) 


^2  = 


Var  T, 


tovCTj^,  T2^) 


Cov(T2^  T2) 


Var  To 


(5.7.12) 


5.8  Problem  F3 


Let  and  be  randan  sanples  form  F^q  and  ^0  with 

sizes  n^Q,  n^^  respectively.  Assume  that  these  two  san^jles  are  mutually 
independent.  Suppose  that  each  of  F^j^,  is  continuous  and  is  either 

known  or  there  is  a randan  sample  fron  it.  We  wish  to  test  the  following 
null  hypothesis: 


»o'  ^aO  ‘ "a'^al  ^ ' “a>fa2 


(5.8.1) 


and 


^bO  " “^^al  ^ * (5.8.2) 

Note  that  the  difference  between  (5.6.1),  (5.6.2),  (5.7.1),  (5.7.2)  and 
(5.8.1),  (5.8.2)  is  that  F^j^  = F^j^  and  F^^  “ ^ (5.8.1),  (5.8.2). 

If  both  F^j^  and  are  known,  we  may  apply  methods  (1)  and  (2) 

discussed  in  section  5.6.  If  only  one  of  F^^  and  F^2  is  known  idiile 

there  is  a randan  sample  for  the  other,  we  may  apply  the  method  discussed 
in  section  5.7.  Now  suppose  that  for  each  of  F^^,  F^2»  there  is  a 
random  sanple  , ^X2^^}  with  size  n^^,  n^2  respectively.  Also 

assume  that  all  samples  are  mutually  independent.  For  testing  (5.8.1) 
we  use 


^2^  ' «lo'  * “02“  - “u  - I , 


(5.8.3) 


vdiere  W^^  is  similarly  defined  as  in  (3.1.2).  For  testing  (5.8.2),  we 


use 


^2"  - “10"  * “oa”  - "12  - T 


Then  Var  T2^,  Var  T-^  can  be  derived  by  using  (3.1.4),  v4iile 


(5.8.4) 
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Cov  CT2®, 

= ^ ( - t“0l“  - “21)  Si”  - “21’  ^ I'fao  - '=a2’(fb0  ■ '=a2)'>r= 


‘aO  a2"^‘b0  *a2^^alj 


n^-  {■^°‘02  ■ “12^  ^“02 


h;^-“i2  -“21 


Define  the  matrix 


- “125  -IcF^o  - F^i)(F^„  -Fbi)dF,4 


“12“21  *\  falH2  * K2H1} 


(5.8.5) 


Var  CovCT^^,  T^^) 

^3  " Cov(T2^,  T2^)  Var 


(5.8.6) 


and  denote  the  corresponding  consistent  estimator  of  Ej  by  Sy 

Let  N = nn  + n +n,+n-,  and  assvme  that  as  N — 00 
aO  Ijq  al  a2  ’ 

n ./N  — r..  and  there  exists  e such  that  0 < e < r..  < 1-c  < 1 , 
aJ  tJ 

for  all  n's.  Then  under  by  Theorems  3.1  and  3.2,  the  statistic 


Tj4  - N(T2“,  Tj*’)' 


(5.8.7) 


has  an  asymptotic  central  chi-square  distribution  with  two  degrees  of 
freedom.  In  general  Tj^^  has  an  asymptotic  non-central  chi-square 
distribution  with  two  degrees  of  freedom  and  noncentrality  parameter 


lim  N(Er2^,  Er2^)E3-^(Er2^,  ^2^)' 
N-x*> 


(5.8.8) 


I 
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